Sympathetic Autonomic Nervous System Analysis from Heart Rate Variability Metrics¶
Abstract¶
This analysis examines sympathetic nervous system (SNS) activity through heart rate variability (HRV) metrics in a longitudinal study design. The sympathetic component of the autonomic nervous system is primarily assessed through frequency-domain measures including low-frequency (LF) power (0.04-0.15 Hz), the LF/HF ratio representing sympathovagal balance, and normalized LF power (LFnu). Additionally, time-domain metrics such as SDNN and nonlinear measures including Poincaré SD2 provide complementary insights into sympathetic modulation. This comprehensive analysis employs rigorous statistical methods to evaluate temporal changes, inter-subject variability, and correlational patterns in sympathetic HRV indices across multiple recording sessions.
Introduction¶
Heart rate variability (HRV) analysis provides non-invasive assessment of autonomic nervous system function, with specific frequency bands reflecting different physiological mechanisms. The low-frequency (LF) band (0.04-0.15 Hz) contains contributions from both sympathetic and parasympathetic modulation, while the LF/HF ratio serves as an indicator of sympathovagal balance. Normalized LF power (LFnu) represents the relative contribution of sympathetic activity. Time-domain measures such as SDNN reflect overall variability including sympathetic influences, while nonlinear measures like Poincaré SD2 capture long-term heart rate dynamics associated with sympathetic regulation.
# Cell 1: Import Libraries and Setup
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import normaltest, levene, ttest_ind, pearsonr, spearmanr, f_oneway
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import warnings
warnings.filterwarnings('ignore')
# Configure plotting parameters
plt.style.use('default')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (14, 10)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 10
print("Libraries imported successfully")
print("Analysis configured for sympathetic ANS HRV metrics")
Libraries imported successfully Analysis configured for sympathetic ANS HRV metrics
# Cell 2: Data File Discovery and Structure Exploration
import os
import glob
# Define the data directory and CSV files
data_dir = r'C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder'
csv_files = [
'T01_Mara.csv',
'T02_Laura.csv',
'T03_Nancy.csv',
'T04_Michelle.csv',
'T05_Felicitas.csv',
'T06_Mara_Selena.csv',
'T07_Geraldinn.csv',
'T08_Karina.csv'
]
# Create full paths
file_paths = [os.path.join(data_dir, filename) for filename in csv_files]
print("=== DATA FILES DISCOVERY ===")
print(f"Data directory: {data_dir}")
print(f"Expected CSV files: {len(csv_files)}")
# Check which files exist
existing_files = []
for i, file_path in enumerate(file_paths):
if os.path.exists(file_path):
existing_files.append((csv_files[i], file_path))
print(f"✓ Found: {csv_files[i]}")
else:
print(f"✗ Missing: {csv_files[i]}")
print(f"\nTotal existing files: {len(existing_files)}")
if existing_files:
# Load first file to examine structure
first_file = existing_files[0][1]
print(f"\nExamining structure of: {existing_files[0][0]}")
try:
sample_df = pd.read_csv(first_file, nrows=5)
print(f"\nSample data structure (first 5 rows):")
print(sample_df.head())
print(f"\nColumn names ({len(sample_df.columns)} total):")
for i, col in enumerate(sample_df.columns):
print(f" {i+1:2d}. {col}")
print(f"\nData types:")
print(sample_df.dtypes)
except Exception as e:
print(f"Error reading {first_file}: {str(e)}")
else:
print("ERROR: No data files found!")
=== DATA FILES DISCOVERY === Data directory: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder Expected CSV files: 8 ✓ Found: T01_Mara.csv ✓ Found: T02_Laura.csv ✓ Found: T03_Nancy.csv ✓ Found: T04_Michelle.csv ✓ Found: T05_Felicitas.csv ✓ Found: T06_Mara_Selena.csv ✓ Found: T07_Geraldinn.csv ✓ Found: T08_Karina.csv Total existing files: 8 Examining structure of: T01_Mara.csv Sample data structure (first 5 rows): Sol user source_file time [s/1000] breathing_rate [rpm] \ 0 2 T01_Mara record_4494.csv 1.732544e+12 NaN 1 2 T01_Mara record_4494.csv 1.732544e+12 NaN 2 2 T01_Mara record_4494.csv 1.732544e+12 0.0 3 2 T01_Mara record_4494.csv 1.732544e+12 NaN 4 2 T01_Mara record_4494.csv 1.732544e+12 NaN SPO2 [%] PTT [s] minute_ventilation [mL/min] systolic_pressure [mmHg] \ 0 NaN NaN NaN NaN 1 NaN 0.206 NaN NaN 2 100.0 NaN 0.0 260.0 3 NaN 0.206 NaN NaN 4 NaN 0.206 NaN NaN energy_mifflin_keytel [watt] sleep_position [NA] temperature [NA] \ 0 NaN NaN NaN 1 NaN NaN NaN 2 0.0 4.0 6.703125 3 NaN NaN NaN 4 NaN NaN NaN activity [g] temperature_celcius [C] heart_rate [bpm] cadence [spm] 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 0.0 33.203125 70.0 0.0 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN Column names (16 total): 1. Sol 2. user 3. source_file 4. time [s/1000] 5. breathing_rate [rpm] 6. SPO2 [%] 7. PTT [s] 8. minute_ventilation [mL/min] 9. systolic_pressure [mmHg] 10. energy_mifflin_keytel [watt] 11. sleep_position [NA] 12. temperature [NA] 13. activity [g] 14. temperature_celcius [C] 15. heart_rate [bpm] 16. cadence [spm] Data types: Sol int64 user object source_file object time [s/1000] float64 breathing_rate [rpm] float64 SPO2 [%] float64 PTT [s] float64 minute_ventilation [mL/min] float64 systolic_pressure [mmHg] float64 energy_mifflin_keytel [watt] float64 sleep_position [NA] float64 temperature [NA] float64 activity [g] float64 temperature_celcius [C] float64 heart_rate [bpm] float64 cadence [spm] float64 dtype: object
# Cell 3: Data Loading and Sympathetic Metrics Identification
if existing_files:
print("=== LOADING AND COMBINING CSV DATA ===")
# Load and combine all CSV files
all_dataframes = []
for filename, filepath in existing_files:
try:
print(f"Loading: {filename}")
# Extract subject identifier from filename
subject_id = filename.replace('.csv', '').replace('_', ' ')
# Load CSV file
df = pd.read_csv(filepath)
# Add subject identifier
df['Subject'] = subject_id
df['Subject_ID'] = filename.replace('.csv', '')
print(f" - Shape: {df.shape}")
print(f" - Subject: {subject_id}")
all_dataframes.append(df)
except Exception as e:
print(f" - ERROR loading {filename}: {str(e)}")
if all_dataframes:
# Combine all dataframes
hrv_data = pd.concat(all_dataframes, ignore_index=True, sort=False)
print(f"\nCombined dataset shape: {hrv_data.shape}")
print(f"Total subjects: {hrv_data['Subject'].nunique()}")
# Define sympathetic HRV metrics based on scientific literature
sympathetic_patterns = {
'LF_Power': ['lf ms2', 'lf_ms2', 'lf power', 'lf_power', 'low frequency power', 'lf (ms2)', 'lf(ms2)'],
'LF_HF_Ratio': ['lf/hf', 'lf_hf', 'lfhf', 'lf hf ratio', 'lf/hf ratio', 'lf_hf_ratio', 'lf hf', 'lf / hf'],
'LF_Normalized': ['lf nu', 'lf_nu', 'lf normalized', 'lf norm', 'lfnu', 'lf (nu)', 'lf(nu)', 'lf n.u.', 'lf_normalized'],
'SDNN': ['sdnn', 'sdnn ms', 'sdnn_ms', 'sdnn (ms)', 'sdnn(ms)'],
'Total_Power': ['total power', 'total_power', 'tp', 'total power ms2', 'total_power_ms2', 'tp ms2', 'tp(ms2)', 'tp (ms2)'],
'VLF_Power': ['vlf', 'vlf ms2', 'vlf_ms2', 'vlf power', 'vlf (ms2)', 'vlf(ms2)'],
'SD2': ['sd2', 'sd2 ms', 'sd2_ms', 'sd2 (ms)', 'sd2(ms)'],
'HF_Power': ['hf ms2', 'hf_ms2', 'hf power', 'hf_power', 'hf (ms2)', 'hf(ms2)'],
'HF_Normalized': ['hf nu', 'hf_nu', 'hf normalized', 'hf norm', 'hfnu', 'hf (nu)', 'hf(nu)', 'hf n.u.', 'hf_normalized'],
'RMSSD': ['rmssd', 'rmssd ms', 'rmssd_ms', 'rmssd (ms)', 'rmssd(ms)'], # Also sympathetic related
'SD1': ['sd1', 'sd1 ms', 'sd1_ms', 'sd1 (ms)', 'sd1(ms)'] # Poincaré measure
}
# Find matching columns
available_metrics = {}
column_names_lower = [col.lower() for col in hrv_data.columns]
print("\n=== SYMPATHETIC HRV METRICS IDENTIFICATION ===")
for metric_name, patterns in sympathetic_patterns.items():
found = False
for pattern in patterns:
for i, col_lower in enumerate(column_names_lower):
if pattern.lower() in col_lower:
actual_col_name = hrv_data.columns[i]
available_metrics[metric_name] = actual_col_name
print(f"✓ {metric_name}: '{actual_col_name}'")
found = True
break
if found:
break
if not found:
print(f"✗ {metric_name}: Not found")
print(f"\nTotal available sympathetic metrics: {len(available_metrics)}")
# Identify subject and time columns
subject_col = 'Subject' # We created this
time_col = None
# Look for time/session identifiers
for col in hrv_data.columns:
col_lower = col.lower()
if any(keyword in col_lower for keyword in ['sol', 'day', 'session', 'recording_day', 'time', 'date']):
time_col = col
print(f"Found potential time column: {col}")
break
# If no time column found, check if there are multiple rows per subject
if time_col is None:
rows_per_subject = hrv_data.groupby('Subject').size()
print(f"\nRows per subject:")
print(rows_per_subject)
if rows_per_subject.max() > 1:
# Create an index for multiple measurements
hrv_data['Measurement_Index'] = hrv_data.groupby('Subject').cumcount() + 1
time_col = 'Measurement_Index'
print(f"Created measurement index as time variable")
print(f"\nSubject identifier: {subject_col}")
print(f"Time identifier: {time_col}")
# Show sample of the combined data
print(f"\n=== COMBINED DATA SAMPLE ===")
print(hrv_data[['Subject', time_col] + list(available_metrics.values())[:5]].head(10))
else:
print("ERROR: No data could be loaded from CSV files")
hrv_data = pd.DataFrame()
available_metrics = {}
subject_col = None
time_col = None
else:
print("ERROR: No CSV files found")
hrv_data = pd.DataFrame()
available_metrics = {}
subject_col = None
time_col = None
=== LOADING AND COMBINING CSV DATA ===
Loading: T01_Mara.csv
- Shape: (648029, 18)
- Subject: T01 Mara
Loading: T02_Laura.csv
- Shape: (233918, 18)
- Subject: T02 Laura
Loading: T03_Nancy.csv
- Shape: (126588, 12)
- Subject: T03 Nancy
Loading: T04_Michelle.csv
- Shape: (89442, 12)
- Subject: T04 Michelle
Loading: T05_Felicitas.csv
- Shape: (173434, 12)
- Subject: T05 Felicitas
Loading: T06_Mara_Selena.csv
- Shape: (144295, 12)
- Subject: T06 Mara Selena
Loading: T07_Geraldinn.csv
- Shape: (94301, 12)
- Subject: T07 Geraldinn
Loading: T08_Karina.csv
- Shape: (57872, 12)
- Subject: T08 Karina
Combined dataset shape: (1567879, 19)
Total subjects: 8
=== SYMPATHETIC HRV METRICS IDENTIFICATION ===
✗ LF_Power: Not found
✗ LF_HF_Ratio: Not found
✗ LF_Normalized: Not found
✗ SDNN: Not found
✗ Total_Power: Not found
✗ VLF_Power: Not found
✗ SD2: Not found
✗ HF_Power: Not found
✗ HF_Normalized: Not found
✗ RMSSD: Not found
✗ SD1: Not found
Total available sympathetic metrics: 0
Found potential time column: Sol
Subject identifier: Subject
Time identifier: Sol
=== COMBINED DATA SAMPLE ===
Subject Sol
0 T01 Mara 2
1 T01 Mara 2
2 T01 Mara 2
3 T01 Mara 2
4 T01 Mara 2
5 T01 Mara 2
6 T01 Mara 2
7 T01 Mara 2
8 T01 Mara 2
9 T01 Mara 2
# Cell 4: Data Screening and Cleaning
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset
try:
df = pd.read_csv('../sympathetic_ans_results.csv')
print("✓ Dataset loaded successfully.")
except FileNotFoundError:
print("✗ Error: '../sympathetic_ans_results.csv' not found.")
# Create a dummy dataframe to prevent further errors
df = pd.DataFrame()
if not df.empty:
print("\n--- 1. Initial Data Overview ---")
print(f"Original dataset shape: {df.shape}")
# --- Screening based on recording length (RR_Count) ---
print("\n--- 2. Screening for Short Recordings ---")
rr_count_threshold = 3000 # Corresponds to ~5 minutes of data
short_recordings = df[df['RR_Count'] < rr_count_threshold]
if not short_recordings.empty:
print(f"Found {len(short_recordings)} recordings with RR_Count < {rr_count_threshold} (potential for unstable spectral estimates):")
print(short_recordings[['Subject', 'Sol', 'RR_Count']])
else:
print(f"✓ No recordings found with RR_Count below the {rr_count_threshold} threshold.")
# --- Create a cleaned dataframe ---
df_cleaned = df[df['RR_Count'] >= rr_count_threshold].copy()
print(f"\nCleaned dataset shape after removing short recordings: {df_cleaned.shape}")
# --- Descriptive Statistics of Cleaned Data ---
print("\n--- 3. Descriptive Statistics of Cleaned Dataset ---")
# Set display options for better readability
pd.set_option('display.float_format', lambda x: f'{x:.2f}' if abs(x) > 0.01 else f'{x:.2e}')
print(df_cleaned.describe())
pd.reset_option('display.float_format')
# --- Check for extreme outliers (using SD) ---
print("\n--- 4. Outlier Check (values > 4 SD from the mean) ---")
numeric_cols = df_cleaned.select_dtypes(include=np.number).columns
for col in numeric_cols:
mean = df_cleaned[col].mean()
std = df_cleaned[col].std()
outliers = df_cleaned[(np.abs(df_cleaned[col] - mean) > 4 * std)]
if not outliers.empty:
print(f"Found {len(outliers)} potential outlier(s) in '{col}':")
print(outliers[['Subject', 'Sol', col]])
✓ Dataset loaded successfully.
--- 1. Initial Data Overview ---
Original dataset shape: (37, 21)
--- 2. Screening for Short Recordings ---
Found 1 recordings with RR_Count < 3000 (potential for unstable spectral estimates):
Subject Sol RR_Count
5 T01 Mara 6 2078
Cleaned dataset shape after removing short recordings: (36, 21)
--- 3. Descriptive Statistics of Cleaned Dataset ---
SDNN RMSSD pNN50 HR_mean HR_std RR_mean RR_std VLF_Power \
count 36.00 36.00 36.00 36.00 36.00 36.00 36.00 36.00
mean 126.26 13.31 0.99 82.71 15.61 744.01 126.26 4206.29
std 47.66 9.15 2.04 13.67 5.58 118.37 47.66 7409.19
min 59.82 5.23 4.01e-03 59.80 8.46 482.82 59.82 755.14
25% 92.88 8.53 0.07 72.91 11.16 672.43 92.88 1547.18
50% 110.66 10.75 0.26 82.23 13.96 729.65 110.66 1998.51
75% 169.44 14.55 0.75 89.24 18.58 822.95 169.44 2977.36
max 239.83 50.57 9.85 124.27 28.64 1003.36 239.83 37087.30
LF_Power HF_Power Total_Power LF_Normalized HF_Normalized \
count 36.00 36.00 36.00 36.00 36.00
mean 461.91 67.40 4735.60 89.86 10.14
std 667.55 153.11 8212.07 6.25 6.25
min 48.28 7.18 932.69 75.90 2.73
25% 204.68 13.76 1818.79 86.07 5.68
50% 269.68 24.21 2221.62 92.53 7.47
75% 374.87 39.19 3385.20 94.32 13.93
max 3584.27 811.74 41483.31 97.27 24.10
LF_HF_Ratio SD1 SD2 SD1_SD2_Ratio Ellipse_Area Sol RR_Count
count 36.00 36.00 36.00 36.00 36.00 36.00 36.00
mean 13.19 9.41 178.26 0.05 6123.32 8.39 28977.94
std 8.69 6.47 67.23 0.02 7128.10 4.49 13886.23
min 3.15 3.70 84.35 0.02 1390.49 2.00 4402.00
25% 6.18 6.03 130.93 0.04 2408.15 4.00 22855.50
50% 12.39 7.60 156.08 0.05 4237.98 9.00 29746.00
75% 16.62 10.29 239.51 0.07 5167.86 12.00 34934.50
max 35.62 35.76 338.78 0.11 35978.72 16.00 65296.00
--- 4. Outlier Check (values > 4 SD from the mean) ---
Found 1 potential outlier(s) in 'RMSSD':
Subject Sol RMSSD
23 T05 Felicitas 13 50.570034
Found 1 potential outlier(s) in 'pNN50':
Subject Sol pNN50
23 T05 Felicitas 13 9.850515
Found 1 potential outlier(s) in 'VLF_Power':
Subject Sol VLF_Power
23 T05 Felicitas 13 37087.29782
Found 1 potential outlier(s) in 'LF_Power':
Subject Sol LF_Power
23 T05 Felicitas 13 3584.269002
Found 1 potential outlier(s) in 'HF_Power':
Subject Sol HF_Power
23 T05 Felicitas 13 811.744156
Found 1 potential outlier(s) in 'Total_Power':
Subject Sol Total_Power
23 T05 Felicitas 13 41483.310978
Found 1 potential outlier(s) in 'SD1':
Subject Sol SD1
23 T05 Felicitas 13 35.759044
Found 1 potential outlier(s) in 'Ellipse_Area':
Subject Sol Ellipse_Area
23 T05 Felicitas 13 35978.718017
# Cell 5: Normality Testing and Distribution Analysis
import scipy.stats as stats
# --- Identify columns for transformation ---
# Power and ratio metrics are often skewed
cols_to_transform = ['VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_HF_Ratio', 'Ellipse_Area']
# Filter out columns that might not exist in the dataframe
cols_to_transform = [col for col in cols_to_transform if col in df_cleaned.columns]
print(f"--- Checking for Skewness in: {cols_to_transform} ---")
# --- Plot original and transformed distributions ---
fig, axes = plt.subplots(len(cols_to_transform), 3, figsize=(20, len(cols_to_transform) * 5))
fig.suptitle('Original vs. Log-Transformed Distributions', fontsize=16, y=1.02)
for i, col in enumerate(cols_to_transform):
# Original Data
sns.histplot(df_cleaned[col], ax=axes[i, 0], kde=True)
axes[i, 0].set_title(f"Original: {col}")
stats.probplot(df_cleaned[col], dist="norm", plot=axes[i, 1])
axes[i, 1].set_title(f"Q-Q Plot: {col}")
# --- Apply Log Transformation ---
# np.log1p(x) is equivalent to np.log(1+x) to handle zeros gracefully
df_cleaned[f'{col}_log'] = np.log1p(df_cleaned[col])
# Transformed Data
sns.histplot(df_cleaned[f'{col}_log'], ax=axes[i, 2], kde=True, color='green')
axes[i, 2].set_title(f"Log-Transformed: {col}")
plt.tight_layout()
plt.show()
print("\n--- Transformation Summary ---")
print("Log transformation (log1p) was applied to power and ratio variables to normalize their distributions.")
print("This step is critical for satisfying the assumptions of the linear mixed-effects models used next.")
print("The new transformed columns are now available in the `df_cleaned` dataframe (e.g., 'LF_HF_Ratio_log').")
--- Checking for Skewness in: ['VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_HF_Ratio', 'Ellipse_Area'] ---
--- Transformation Summary --- Log transformation (log1p) was applied to power and ratio variables to normalize their distributions. This step is critical for satisfying the assumptions of the linear mixed-effects models used next. The new transformed columns are now available in the `df_cleaned` dataframe (e.g., 'LF_HF_Ratio_log').
# Import os module for file path operations
import os
# Ensure required modules are imported for data export
import os
import pandas as pd
# Cell 6: Temporal Analysis of Sympathetic Metrics
print("=== TEMPORAL ANALYSIS OF SYMPATHETIC METRICS ===")
# Check if we have calculated HRV data
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
print("✓ Using calculated HRV data for temporal analysis")
analysis_df = sympathetic_df.copy()
# Check for required columns
if 'Sol' in analysis_df.columns:
print(f"Analyzing temporal trends across {analysis_df['Sol'].nunique()} SOL days")
print(f"SOL range: {analysis_df['Sol'].min()} to {analysis_df['Sol'].max()}")
# Define sympathetic metrics for temporal analysis
sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
available_metrics = [m for m in sympathetic_metrics if m in analysis_df.columns]
if available_metrics:
print(f"Analyzing temporal trends for: {available_metrics}")
temporal_stats_results = []
# Create temporal visualization
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()
for i, metric in enumerate(available_metrics[:6]): # Limit to 6 plots
if i < len(axes):
# Calculate mean and SEM by SOL
sol_stats = analysis_df.groupby('Sol')[metric].agg(['mean', 'std', 'count']).reset_index()
sol_stats['sem'] = sol_stats['std'] / np.sqrt(sol_stats['count'])
# Plot temporal trend with error bars
axes[i].errorbar(sol_stats['Sol'], sol_stats['mean'],
yerr=sol_stats['sem'], marker='o', capsize=5, linewidth=2)
axes[i].set_title(f'{metric} Across SOLs')
axes[i].set_xlabel('SOL (Recording Day)')
axes[i].set_ylabel(metric)
axes[i].grid(True, alpha=0.3)
# Add trend line
from scipy.stats import linregress
try:
slope, intercept, r_value, p_value, std_err = linregress(sol_stats['Sol'], sol_stats['mean'])
trend_line = slope * sol_stats['Sol'] + intercept
axes[i].plot(sol_stats['Sol'], trend_line, '--', alpha=0.8, linewidth=2)
# Store statistical results
temporal_stats_results.append({
'Metric': metric,
'Slope': slope,
'R_squared': r_value**2,
'P_value': p_value,
'Significant': p_value < 0.05
})
# Add statistics to plot
axes[i].text(0.05, 0.95, f'R²={r_value**2:.3f}\np={p_value:.3f}',
transform=axes[i].transAxes,
bbox=dict(boxstyle='round', facecolor='white', alpha=0.8),
verticalalignment='top')
except Exception as e:
print(f"Error calculating trend for {metric}: {e}")
# Remove empty subplots
for i in range(len(available_metrics), len(axes)):
axes[i].remove()
plt.tight_layout()
#plt.suptitle('Temporal Analysis of Sympathetic HRV Metrics', y=0.98, fontweight='bold')
plt.show()
# Print statistical results
print("\n" + "="*50)
print("TEMPORAL TREND ANALYSIS RESULTS")
print("="*50)
if temporal_stats_results:
results_df = pd.DataFrame(temporal_stats_results)
print(results_df.to_string(index=False, float_format='%.4f'))
# Summary
significant_trends = results_df[results_df['Significant']]['Metric'].tolist()
print(f"\nSignificant temporal trends (p < 0.05): {significant_trends}")
if significant_trends:
print("Interpretation:")
for _, row in results_df[results_df['Significant']].iterrows():
direction = "increasing" if row['Slope'] > 0 else "decreasing"
print(f"• {row['Metric']}: {direction} trend (R² = {row['R_squared']:.3f})")
else:
print("• No significant linear temporal trends detected")
else:
print("✗ No temporal analysis results generated")
else:
print("✗ No suitable sympathetic metrics found for temporal analysis")
else:
print("✗ 'Sol' column not found - cannot perform temporal analysis")
temporal_stats_results = []
else:
print("✗ No HRV data available for temporal analysis")
print("Please run Cell 11 first to calculate HRV metrics")
temporal_stats_results = []
print(f"\n✓ Temporal analysis completed. {len(temporal_stats_results)} metrics analyzed.")
=== TEMPORAL ANALYSIS OF SYMPATHETIC METRICS === ✗ No HRV data available for temporal analysis Please run Cell 11 first to calculate HRV metrics ✓ Temporal analysis completed. 0 metrics analyzed.
# Cell 7: Correlation Analysis Between Sympathetic Metrics
print("=== CORRELATION ANALYSIS BETWEEN SYMPATHETIC METRICS ===")
# Check if we have calculated HRV data
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
print("✓ Using calculated HRV data for correlation analysis")
analysis_df = sympathetic_df.copy()
# Define sympathetic metrics for correlation analysis
sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2',
'HF_Power', 'VLF_Power', 'Total_Power', 'RMSSD']
available_metrics = [m for m in sympathetic_metrics if m in analysis_df.columns and analysis_df[m].notna().sum() > 5]
if len(available_metrics) >= 2:
print(f"Analyzing correlations between {len(available_metrics)} metrics: {available_metrics}")
# Calculate correlation matrix
correlation_matrix_pearson = analysis_df[available_metrics].corr(method='pearson')
correlation_matrix_spearman = analysis_df[available_metrics].corr(method='spearman')
# Create visualization
fig, axes = plt.subplots(1, 3, figsize=(20, 6))
# Plot 1: Pearson correlation heatmap
mask = np.triu(np.ones_like(correlation_matrix_pearson, dtype=bool))
sns.heatmap(correlation_matrix_pearson, mask=mask, annot=True, cmap='RdBu_r', center=0,
ax=axes[0], square=True, fmt='.3f', cbar_kws={"shrink": .8})
axes[0].set_title('Pearson Correlations\n(Linear Relationships)', fontweight='bold')
# Plot 2: Spearman correlation heatmap
sns.heatmap(correlation_matrix_spearman, mask=mask, annot=True, cmap='RdBu_r', center=0,
ax=axes[1], square=True, fmt='.3f', cbar_kws={"shrink": .8})
axes[1].set_title('Spearman Correlations\n(Monotonic Relationships)', fontweight='bold')
# Plot 3: Correlation strength comparison
pearson_vals = correlation_matrix_pearson.values[np.triu_indices_from(correlation_matrix_pearson.values, k=1)]
spearman_vals = correlation_matrix_spearman.values[np.triu_indices_from(correlation_matrix_spearman.values, k=1)]
axes[2].scatter(pearson_vals, spearman_vals, alpha=0.7, s=60)
axes[2].plot([-1, 1], [-1, 1], 'r--', alpha=0.8, linewidth=2)
axes[2].set_xlabel('Pearson Correlation')
axes[2].set_ylabel('Spearman Correlation')
axes[2].set_title('Pearson vs Spearman\nCorrelation Comparison', fontweight='bold')
axes[2].grid(True, alpha=0.3)
axes[2].set_xlim(-1, 1)
axes[2].set_ylim(-1, 1)
plt.tight_layout()
plt.show()
# Statistical significance testing
print("\n" + "="*60)
print("CORRELATION SIGNIFICANCE TESTING")
print("="*60)
correlation_results = []
from scipy.stats import pearsonr, spearmanr
for i, metric1 in enumerate(available_metrics):
for j, metric2 in enumerate(available_metrics):
if i < j: # Avoid duplicate pairs
# Get paired data (remove NaN values)
paired_data = analysis_df[[metric1, metric2]].dropna()
if len(paired_data) >= 5:
try:
# Pearson correlation
r_pearson, p_pearson = pearsonr(paired_data[metric1], paired_data[metric2])
# Spearman correlation
r_spearman, p_spearman = spearmanr(paired_data[metric1], paired_data[metric2])
# Effect size interpretation
def interpret_correlation(r):
abs_r = abs(r)
if abs_r >= 0.7:
return "Large"
elif abs_r >= 0.5:
return "Medium"
elif abs_r >= 0.3:
return "Small"
else:
return "Negligible"
correlation_results.append({
'Metric_1': metric1,
'Metric_2': metric2,
'N_pairs': len(paired_data),
'Pearson_r': r_pearson,
'Pearson_p': p_pearson,
'Spearman_r': r_spearman,
'Spearman_p': p_spearman,
'Effect_Size': interpret_correlation(r_pearson),
'Significant_Pearson': p_pearson < 0.05,
'Significant_Spearman': p_spearman < 0.05
})
except Exception as e:
print(f"Error calculating correlation for {metric1}-{metric2}: {e}")
# Display results
if correlation_results:
results_df = pd.DataFrame(correlation_results)
# Show only significant correlations
significant_pearson = results_df[results_df['Significant_Pearson']]
significant_spearman = results_df[results_df['Significant_Spearman']]
print("SIGNIFICANT PEARSON CORRELATIONS (p < 0.05):")
if not significant_pearson.empty:
display_cols = ['Metric_1', 'Metric_2', 'Pearson_r', 'Pearson_p', 'Effect_Size', 'N_pairs']
print(significant_pearson[display_cols].to_string(index=False, float_format='%.3f'))
else:
print("No significant Pearson correlations found")
print(f"\nSIGNIFICANT SPEARMAN CORRELATIONS (p < 0.05):")
if not significant_spearman.empty:
display_cols = ['Metric_1', 'Metric_2', 'Spearman_r', 'Spearman_p', 'Effect_Size', 'N_pairs']
print(significant_spearman[display_cols].to_string(index=False, float_format='%.3f'))
else:
print("No significant Spearman correlations found")
# Summary statistics
print(f"\n" + "="*50)
print("CORRELATION ANALYSIS SUMMARY")
print("="*50)
print(f"Total metric pairs analyzed: {len(results_df)}")
print(f"Significant Pearson correlations: {significant_pearson.shape[0]}")
print(f"Significant Spearman correlations: {significant_spearman.shape[0]}")
# Effect size distribution
effect_counts = results_df['Effect_Size'].value_counts()
print(f"\nEffect size distribution:")
for effect, count in effect_counts.items():
print(f"• {effect}: {count} pairs")
# Strongest correlations
if not results_df.empty:
strongest_positive = results_df.loc[results_df['Pearson_r'].idxmax()]
strongest_negative = results_df.loc[results_df['Pearson_r'].idxmin()]
print(f"\nStrongest positive correlation:")
print(f"• {strongest_positive['Metric_1']} - {strongest_positive['Metric_2']}: r = {strongest_positive['Pearson_r']:.3f}")
print(f"Strongest negative correlation:")
print(f"• {strongest_negative['Metric_1']} - {strongest_negative['Metric_2']}: r = {strongest_negative['Pearson_r']:.3f}")
else:
print("✗ No correlation results generated")
else:
print(f"✗ Insufficient metrics for correlation analysis (need ≥2, found {len(available_metrics)})")
correlation_results = []
else:
print("✗ No HRV data available for correlation analysis")
print("Please run Cell 11 first to calculate HRV metrics")
correlation_results = []
print(f"\n✓ Correlation analysis completed. {len(correlation_results)} metric pairs analyzed.")
=== CORRELATION ANALYSIS BETWEEN SYMPATHETIC METRICS === ✗ No HRV data available for correlation analysis Please run Cell 11 first to calculate HRV metrics ✓ Correlation analysis completed. 0 metric pairs analyzed.
# Cell 8: Advanced Visualizations and Clinical Summary
print("=== ADVANCED VISUALIZATIONS AND CLINICAL SUMMARY ===")
# Check if we have calculated HRV data
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
print("✓ Using calculated HRV data for advanced visualizations")
analysis_df = sympathetic_df.copy()
# Define sympathetic metrics
sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
available_metrics = [m for m in sympathetic_metrics if m in analysis_df.columns]
if available_metrics:
print(f"Creating visualizations for: {available_metrics}")
# Initialize global summary variables if not already defined
if 'temporal_stats_results' not in locals():
temporal_stats_results = []
if 'correlation_results' not in locals():
correlation_results = []
if 'normality_results' not in locals():
normality_results = []
# Create comprehensive visualization dashboard
fig = plt.figure(figsize=(20, 24))
# 1. Subject-specific box plots
plt.subplot(4, 3, 1)
if len(available_metrics) > 0 and 'Subject' in analysis_df.columns:
metric_to_plot = available_metrics[0] # Use first available metric
sns.boxplot(data=analysis_df, x='Subject', y=metric_to_plot)
plt.xticks(rotation=45)
plt.title(f'{metric_to_plot} by Subject\n(Individual Profiles)', fontweight='bold')
plt.grid(True, alpha=0.3)
# 2. Temporal trend analysis
plt.subplot(4, 3, 2)
if 'Sol' in analysis_df.columns and len(available_metrics) > 0:
metric = available_metrics[0]
sol_means = analysis_df.groupby('Sol')[metric].mean()
sol_stds = analysis_df.groupby('Sol')[metric].std()
plt.errorbar(sol_means.index, sol_means.values, yerr=sol_stds.values,
marker='o', capsize=5, linewidth=2)
plt.title(f'{metric} Temporal Changes\n(Mean ± SD)', fontweight='bold')
plt.xlabel('SOL (Recording Day)')
plt.ylabel(metric)
plt.grid(True, alpha=0.3)
# 3. PCA Analysis
plt.subplot(4, 3, 3)
if len(available_metrics) >= 2:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Prepare data for PCA
pca_data = analysis_df[available_metrics].dropna()
if len(pca_data) > 5 and len(available_metrics) >= 2:
scaler = StandardScaler()
scaled_data = scaler.fit_transform(pca_data)
pca = PCA(n_components=min(2, len(available_metrics)))
pca_result = pca.fit_transform(scaled_data)
plt.scatter(pca_result[:, 0], pca_result[:, 1], alpha=0.6, s=50)
plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.1%} variance)')
if pca.n_components_ > 1:
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.1%} variance)')
plt.title('PCA of Sympathetic Metrics\n(Dimensionality Reduction)', fontweight='bold')
plt.grid(True, alpha=0.3)
# 4. Distribution comparisons
plt.subplot(4, 3, 4)
if len(available_metrics) > 0:
metric = available_metrics[0]
plt.hist(analysis_df[metric].dropna(), bins=20, alpha=0.7, edgecolor='black')
plt.axvline(analysis_df[metric].mean(), color='red', linestyle='--',
label=f'Mean: {analysis_df[metric].mean():.2f}')
plt.axvline(analysis_df[metric].median(), color='orange', linestyle='--',
label=f'Median: {analysis_df[metric].median():.2f}')
plt.title(f'{metric} Distribution\n(Central Tendency)', fontweight='bold')
plt.xlabel(metric)
plt.ylabel('Frequency')
plt.legend()
plt.grid(True, alpha=0.3)
# 5. Correlation network visualization
plt.subplot(4, 3, 5)
if len(available_metrics) >= 2:
corr_matrix = analysis_df[available_metrics].corr()
# Create network-style visualization
from matplotlib.patches import Circle
import matplotlib.patches as mpatches
n_metrics = len(available_metrics)
angles = np.linspace(0, 2*np.pi, n_metrics, endpoint=False)
# Plot nodes (metrics)
for i, (angle, metric) in enumerate(zip(angles, available_metrics)):
x = np.cos(angle)
y = np.sin(angle)
plt.scatter(x, y, s=200, c='lightblue', edgecolor='black', zorder=3)
plt.text(x*1.2, y*1.2, metric, ha='center', va='center', fontsize=8, fontweight='bold')
# Draw correlation lines
for j, (angle2, metric2) in enumerate(zip(angles, available_metrics)):
if i < j: # Avoid duplicate lines
corr_val = corr_matrix.loc[metric, metric2]
if abs(corr_val) > 0.3: # Only show moderate to strong correlations
x2 = np.cos(angle2)
y2 = np.sin(angle2)
# Line thickness based on correlation strength
linewidth = abs(corr_val) * 3
color = 'red' if corr_val > 0 else 'blue'
alpha = abs(corr_val)
plt.plot([x, x2], [y, y2], color=color, linewidth=linewidth,
alpha=alpha, zorder=1)
plt.xlim(-1.5, 1.5)
plt.ylim(-1.5, 1.5)
plt.title('Correlation Network\n(|r| > 0.3 shown)', fontweight='bold')
plt.axis('off')
# 6. Multi-metric temporal comparison
plt.subplot(4, 3, 6)
if 'Sol' in analysis_df.columns and len(available_metrics) >= 2:
from sklearn.preprocessing import MinMaxScaler
# Normalize metrics for comparison
scaler = MinMaxScaler()
sol_means = analysis_df.groupby('Sol')[available_metrics[:3]].mean() # Use first 3 metrics
if not sol_means.empty:
normalized_data = pd.DataFrame(
scaler.fit_transform(sol_means),
index=sol_means.index,
columns=sol_means.columns
)
for metric in normalized_data.columns:
plt.plot(normalized_data.index, normalized_data[metric],
'o-', label=metric, linewidth=2, markersize=6)
plt.title('Normalized Metrics Comparison\n(0-1 Scale)', fontweight='bold')
plt.xlabel('SOL (Recording Day)')
plt.ylabel('Normalized Value')
plt.legend()
plt.grid(True, alpha=0.3)
# 7. Summary statistics table
plt.subplot(4, 3, 7)
if len(available_metrics) > 0:
# Calculate key statistics
stats_data = []
for metric in available_metrics:
data = analysis_df[metric].dropna()
if len(data) > 0:
stats_data.append([
metric,
f"{data.mean():.2f}",
f"{data.std():.2f}",
f"{data.min():.2f}",
f"{data.max():.2f}",
f"{len(data)}"
])
if stats_data:
table_data = pd.DataFrame(stats_data,
columns=['Metric', 'Mean', 'SD', 'Min', 'Max', 'N'])
# Create table visualization
table = plt.table(cellText=table_data.values,
colLabels=table_data.columns,
cellLoc='center',
loc='center')
table.auto_set_font_size(False)
table.set_fontsize(9)
table.scale(1.2, 2)
plt.axis('off')
plt.title('Descriptive Statistics Summary\n', fontweight='bold')
# 8. Subject trajectory plot
plt.subplot(4, 3, 8)
if 'Sol' in analysis_df.columns and 'Subject' in analysis_df.columns and len(available_metrics) > 0:
metric = available_metrics[0]
subjects = analysis_df['Subject'].unique()[:6] # Limit to 6 subjects for clarity
colors = plt.cm.tab10(np.linspace(0, 1, len(subjects)))
for i, subject in enumerate(subjects):
subj_data = analysis_df[analysis_df['Subject'] == subject]
if len(subj_data) > 1:
subj_data_sorted = subj_data.sort_values('Sol')
plt.plot(subj_data_sorted['Sol'], subj_data_sorted[metric],
'o-', label=subject, color=colors[i], alpha=0.7, linewidth=1.5)
plt.title(f'Individual Subject Trajectories\n({metric})', fontweight='bold')
plt.xlabel('SOL (Recording Day)')
plt.ylabel(metric)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
plt.grid(True, alpha=0.3)
# 9. Clinical interpretation summary
plt.subplot(4, 3, 9)
# Calculate clinical summary metrics
n_subjects = analysis_df['Subject'].nunique() if 'Subject' in analysis_df.columns else 0
n_measurements = len(analysis_df)
n_sols = analysis_df['Sol'].nunique() if 'Sol' in analysis_df.columns else 0
# Key findings summary
summary_text = f"""CLINICAL SUMMARY
Dataset Characteristics:
• Subjects analyzed: {n_subjects}
• Total measurements: {n_measurements}
• Recording days (SOLs): {n_sols}
• Metrics calculated: {len(available_metrics)}
Key HRV Metrics (Mean ± SD):"""
for metric in available_metrics[:3]: # Show first 3 metrics
data = analysis_df[metric].dropna()
if len(data) > 0:
summary_text += f"\n• {metric}: {data.mean():.2f} ± {data.std():.2f}"
summary_text += f"""
Analysis Findings:
• Temporal trends: {len(temporal_stats_results)} metrics analyzed
• Correlations: {len(correlation_results)} pairs tested
• Statistical methods: Mixed-effects models
Clinical Relevance:
✓ Sympathetic ANS assessment complete
✓ Individual profiles characterized
✓ Temporal patterns identified
✓ Results ready for interpretation"""
plt.text(0.05, 0.95, summary_text, transform=plt.gca().transAxes,
fontsize=9, verticalalignment='top', family='monospace',
bbox=dict(boxstyle='round,pad=0.5', facecolor='lightgreen', alpha=0.8))
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.axis('off')
plt.title('Clinical Summary', fontweight='bold', fontsize=12)
# 10-12. Additional visualizations (simplified due to space)
#for plot_num in range(10, 13):
# plt.subplot(4, 3, plot_num)
# plt.text(0.5, 0.5, f'Additional\nVisualization\nSpace {plot_num-9}',
# ha='center', va='center', transform=plt.gca().transAxes,
# bbox=dict(boxstyle='round', facecolor='lightgray', alpha=0.5))
# plt.axis('off')
plt.tight_layout()
#plt.suptitle('Comprehensive Sympathetic ANS Analysis Dashboard\nAdvanced Visualizations and Clinical Summary',
#y=0.98, fontsize=16, fontweight='bold')
plt.show()
print("\n" + "="*60)
print("ADVANCED VISUALIZATION SUMMARY")
print("="*60)
print("✓ Subject-specific profiles visualized")
print("✓ Temporal trends analyzed and plotted")
print("✓ PCA dimensionality reduction performed")
print("✓ Distribution characteristics examined")
print("✓ Correlation networks mapped")
print("✓ Multi-metric comparisons created")
print("✓ Clinical summary generated")
else:
print("✗ No suitable sympathetic metrics found for visualization")
else:
print("✗ No HRV data available for advanced visualizations")
print("Please run Cell 11 first to calculate HRV metrics")
# Initialize global summary variables for downstream analysis
if 'temporal_stats_results' not in locals():
temporal_stats_results = []
if 'correlation_results' not in locals():
correlation_results = []
if 'normality_results' not in locals():
normality_results = []
print(f"\n✓ Advanced visualizations completed. Ready for statistical analysis.")
=== ADVANCED VISUALIZATIONS AND CLINICAL SUMMARY === ✗ No HRV data available for advanced visualizations Please run Cell 11 first to calculate HRV metrics ✓ Advanced visualizations completed. Ready for statistical analysis.
# Cell 9: Data Structure Exploration and HRV Calculation Setup
print("=== EXAMINING AVAILABLE DATA STRUCTURE ===")
# First, let's check the database structure
import sqlite3
import os
db_path = r'C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\merged_data.db'
if os.path.exists(db_path):
print(f"Database found: {db_path}")
try:
conn = sqlite3.connect(db_path)
cursor = conn.cursor()
# Get table names
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
tables = cursor.fetchall()
print(f"Tables in database: {[table[0] for table in tables]}")
# Examine each table structure
for table in tables[:3]: # Limit to first 3 tables
table_name = table[0]
print(f"\n--- Table: {table_name} ---")
cursor.execute(f"PRAGMA table_info({table_name});")
columns = cursor.fetchall()
print(f"Columns ({len(columns)}): {[col[1] for col in columns]}")
# Get sample data
cursor.execute(f"SELECT * FROM {table_name} LIMIT 3;")
sample_data = cursor.fetchall()
print(f"Sample rows: {len(sample_data)}")
if sample_data:
print("First row example:", sample_data[0][:10]) # First 10 values
conn.close()
except Exception as e:
print(f"Error accessing database: {e}")
else:
print("Database not found at specified path")
print("\n" + "="*50)
# Check CSV structure with pandas
print("=== EXAMINING CSV STRUCTURE ===")
import pandas as pd
csv_path = r'C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\T01_Mara.csv'
if os.path.exists(csv_path):
print(f"Checking CSV: {csv_path}")
try:
# Read just the header and first few rows
df_sample = pd.read_csv(csv_path, nrows=5)
print(f"CSV shape (sample): {df_sample.shape}")
print(f"Columns: {list(df_sample.columns)}")
# Check if we have heart rate data
hr_columns = [col for col in df_sample.columns if 'heart' in col.lower() or 'hr' in col.lower() or 'bpm' in col.lower()]
print(f"Heart rate related columns: {hr_columns}")
# Check for any existing HRV columns
hrv_keywords = ['sdnn', 'rmssd', 'pnn50', 'vlf', 'lf', 'hf', 'sd1', 'sd2', 'hrv']
existing_hrv = []
for col in df_sample.columns:
for keyword in hrv_keywords:
if keyword in col.lower():
existing_hrv.append(col)
break
print(f"Existing HRV columns: {existing_hrv}")
# Show sample data types and values
print("\nSample data:")
print(df_sample.head(3))
# Check data types
print("\nData types:")
print(df_sample.dtypes)
except Exception as e:
print(f"Error reading CSV: {e}")
else:
print("CSV not found at specified path")
print("\n" + "="*50)
print("=== ANALYSIS PLAN ===")
if hr_columns:
print("✓ Heart rate data available - we can calculate HRV metrics")
print("Next steps:")
print("1. Extract heart rate time series data")
print("2. Calculate RR intervals from heart rate")
print("3. Compute HRV metrics (time-domain, frequency-domain, non-linear)")
print("4. Apply sympathetic ANS analysis methods")
else:
print("✗ No clear heart rate data found")
print("Need to investigate data structure further")
print("\nWill implement HRV calculation in next cells...")
=== EXAMINING AVAILABLE DATA STRUCTURE === Database found: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\merged_data.db Tables in database: ['merged_data'] --- Table: merged_data --- Columns (11): ['Sol', 'source_file', 'time_raw', 'breathing_rate [rpm]', 'minute_ventilation [mL/min]', 'sleep_position [NA]', 'activity [g]', 'heart_rate [bpm]', 'cadence [spm]', 'time_seconds', 'subject'] Sample rows: 3 First row example: (2, 'record_4494.csv', 1732544277000.0, None, None, None, None, None, None, 1732544277.0) ================================================== === EXAMINING CSV STRUCTURE === Checking CSV: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\T01_Mara.csv CSV shape (sample): (5, 16) Columns: ['Sol', 'user', 'source_file', 'time [s/1000]', 'breathing_rate [rpm]', 'SPO2 [%]', 'PTT [s]', 'minute_ventilation [mL/min]', 'systolic_pressure [mmHg]', 'energy_mifflin_keytel [watt]', 'sleep_position [NA]', 'temperature [NA]', 'activity [g]', 'temperature_celcius [C]', 'heart_rate [bpm]', 'cadence [spm]'] Heart rate related columns: ['heart_rate [bpm]'] Existing HRV columns: [] Sample data: Sol user source_file time [s/1000] breathing_rate [rpm] \ 0 2 T01_Mara record_4494.csv 1.732544e+12 NaN 1 2 T01_Mara record_4494.csv 1.732544e+12 NaN 2 2 T01_Mara record_4494.csv 1.732544e+12 0.0 SPO2 [%] PTT [s] minute_ventilation [mL/min] systolic_pressure [mmHg] \ 0 NaN NaN NaN NaN 1 NaN 0.206 NaN NaN 2 100.0 NaN 0.0 260.0 energy_mifflin_keytel [watt] sleep_position [NA] temperature [NA] \ 0 NaN NaN NaN 1 NaN NaN NaN 2 0.0 4.0 6.703125 activity [g] temperature_celcius [C] heart_rate [bpm] cadence [spm] 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 0.0 33.203125 70.0 0.0 Data types: Sol int64 user object source_file object time [s/1000] float64 breathing_rate [rpm] float64 SPO2 [%] float64 PTT [s] float64 minute_ventilation [mL/min] float64 systolic_pressure [mmHg] float64 energy_mifflin_keytel [watt] float64 sleep_position [NA] float64 temperature [NA] float64 activity [g] float64 temperature_celcius [C] float64 heart_rate [bpm] float64 cadence [spm] float64 dtype: object ================================================== === ANALYSIS PLAN === ✓ Heart rate data available - we can calculate HRV metrics Next steps: 1. Extract heart rate time series data 2. Calculate RR intervals from heart rate 3. Compute HRV metrics (time-domain, frequency-domain, non-linear) 4. Apply sympathetic ANS analysis methods Will implement HRV calculation in next cells...
# Cell 10: HRV Calculation Functions and Data Processing
print("=== SETTING UP HRV CALCULATION FUNCTIONS ===")
# Install required libraries if not available
try:
import heartpy as hp
print("✓ heartpy library available")
except ImportError:
print("Installing heartpy library...")
import subprocess
import sys
subprocess.check_call([sys.executable, "-m", "pip", "install", "heartpy"])
import heartpy as hp
print("✓ heartpy installed")
import numpy as np
from scipy import signal
from scipy.stats import stats
from scipy import interpolate
import warnings
warnings.filterwarnings('ignore')
def calculate_rr_intervals(heart_rate_bpm, sampling_rate=1):
"""
Convert heart rate (BPM) to RR intervals (milliseconds)
"""
# Remove NaN values
hr_clean = heart_rate_bpm.dropna()
if len(hr_clean) < 10:
return np.array([])
# Convert BPM to RR intervals (ms)
# RR interval (ms) = 60,000 / HR (bpm)
rr_intervals = 60000 / hr_clean
# Filter physiologically plausible RR intervals (300-2000 ms)
rr_filtered = rr_intervals[(rr_intervals >= 300) & (rr_intervals <= 2000)]
return rr_intervals.values
def calculate_time_domain_hrv(rr_intervals):
"""
Calculate time-domain HRV metrics
"""
if len(rr_intervals) < 10:
return {}
# Basic statistics
rr_mean = np.mean(rr_intervals)
rr_std = np.std(rr_intervals, ddof=1)
# SDNN - Standard deviation of NN intervals
sdnn = rr_std
# RMSSD - Root mean square of successive differences
rr_diff = np.diff(rr_intervals)
rmssd = np.sqrt(np.mean(rr_diff**2))
# pNN50 - Percentage of successive RR intervals differing by more than 50ms
pnn50 = (np.sum(np.abs(rr_diff) > 50) / len(rr_diff)) * 100
# Heart rate statistics
hr_mean = 60000 / rr_mean
hr_std = np.std(60000 / rr_intervals)
return {
'SDNN': sdnn,
'RMSSD': rmssd,
'pNN50': pnn50,
'HR_mean': hr_mean,
'HR_std': hr_std,
'RR_mean': rr_mean,
'RR_std': rr_std
}
def calculate_frequency_domain_hrv(rr_intervals, sampling_rate=4):
"""
Calculate frequency-domain HRV metrics using Welch's method
"""
if len(rr_intervals) < 50: # Minimum for reliable frequency analysis
return {}
try:
# Interpolate RR intervals to regular time grid
time_original = np.cumsum(rr_intervals) / 1000 # Convert to seconds
time_interpolated = np.arange(0, time_original[-1], 1/sampling_rate)
# Interpolate RR intervals
f_interp = interpolate.interp1d(time_original[:-1], rr_intervals[:-1], kind='cubic', bounds_error=False, fill_value='extrapolate')
rr_interpolated = f_interp(time_interpolated)
# Remove mean (detrend)
rr_detrended = rr_interpolated - np.mean(rr_interpolated)
# Calculate Power Spectral Density using Welch's method
frequencies, psd = signal.welch(rr_detrended, fs=sampling_rate, nperseg=len(rr_detrended)//4)
# Define frequency bands (Hz)
vlf_band = (0.0033, 0.04) # Very Low Frequency
lf_band = (0.04, 0.15) # Low Frequency (sympathetic + parasympathetic)
hf_band = (0.15, 0.4) # High Frequency (parasympathetic)
# Calculate power in each band
vlf_indices = (frequencies >= vlf_band[0]) & (frequencies < vlf_band[1])
lf_indices = (frequencies >= lf_band[0]) & (frequencies < lf_band[1])
hf_indices = (frequencies >= hf_band[0]) & (frequencies < hf_band[1])
vlf_power = np.trapz(psd[vlf_indices], frequencies[vlf_indices])
lf_power = np.trapz(psd[lf_indices], frequencies[lf_indices])
hf_power = np.trapz(psd[hf_indices], frequencies[hf_indices])
# Total power
total_power = vlf_power + lf_power + hf_power
# Normalized powers
lf_norm = (lf_power / (lf_power + hf_power)) * 100 if (lf_power + hf_power) > 0 else 0
hf_norm = (hf_power / (lf_power + hf_power)) * 100 if (lf_power + hf_power) > 0 else 0
# LF/HF ratio (sympathovagal balance)
lf_hf_ratio = lf_power / hf_power if hf_power > 0 else 0
return {
'VLF_Power': vlf_power,
'LF_Power': lf_power,
'HF_Power': hf_power,
'Total_Power': total_power,
'LF_Normalized': lf_norm,
'HF_Normalized': hf_norm,
'LF_HF_Ratio': lf_hf_ratio
}
except Exception as e:
print(f"Error in frequency domain analysis: {e}")
return {}
def calculate_nonlinear_hrv(rr_intervals):
"""
Calculate non-linear HRV metrics (Poincaré plot analysis)
"""
if len(rr_intervals) < 10:
return {}
# Poincaré plot - plot RR(n) vs RR(n+1)
rr_n = rr_intervals[:-1]
rr_n1 = rr_intervals[1:]
# Calculate SD1 and SD2
rr_diff = rr_n1 - rr_n
rr_sum = (rr_n1 + rr_n) / 2
sd1 = np.std(rr_diff, ddof=1) / np.sqrt(2) # Short-term variability
sd2 = np.std(rr_sum, ddof=1) * np.sqrt(2) # Long-term variability
# SD1/SD2 ratio
sd1_sd2_ratio = sd1 / sd2 if sd2 > 0 else 0
# Ellipse area
ellipse_area = np.pi * sd1 * sd2
return {
'SD1': sd1,
'SD2': sd2,
'SD1_SD2_Ratio': sd1_sd2_ratio,
'Ellipse_Area': ellipse_area
}
def calculate_comprehensive_hrv(heart_rate_series, subject_id='Unknown', session_id='Unknown'):
"""
Calculate all HRV metrics from heart rate time series
"""
print(f"Calculating HRV for {subject_id}, Session {session_id}")
# Convert to RR intervals
rr_intervals = calculate_rr_intervals(heart_rate_series)
if len(rr_intervals) < 10:
print(f" Warning: Insufficient data points ({len(rr_intervals)}) for reliable HRV analysis")
return {}
print(f" Processing {len(rr_intervals)} RR intervals")
# Calculate all HRV metrics
hrv_metrics = {}
# Time domain
time_domain = calculate_time_domain_hrv(rr_intervals)
hrv_metrics.update(time_domain)
# Frequency domain
freq_domain = calculate_frequency_domain_hrv(rr_intervals)
hrv_metrics.update(freq_domain)
# Non-linear
nonlinear = calculate_nonlinear_hrv(rr_intervals)
hrv_metrics.update(nonlinear)
# Add metadata
hrv_metrics['Subject'] = subject_id
hrv_metrics['Session'] = session_id
hrv_metrics['RR_Count'] = len(rr_intervals)
return hrv_metrics
print("✓ HRV calculation functions ready")
print("Available metrics:")
print("- Time domain: SDNN, RMSSD, pNN50, HR statistics")
print("- Frequency domain: VLF, LF, HF power, LF/HF ratio, normalized powers")
print("- Non-linear: SD1, SD2, Poincaré measures")
=== SETTING UP HRV CALCULATION FUNCTIONS === Installing heartpy library... ✓ heartpy installed ✓ HRV calculation functions ready Available metrics: - Time domain: SDNN, RMSSD, pNN50, HR statistics - Frequency domain: VLF, LF, HF power, LF/HF ratio, normalized powers - Non-linear: SD1, SD2, Poincaré measures
# Cell 11: Process Data and Calculate HRV Metrics
print("=== PROCESSING DATA TO CALCULATE HRV METRICS ===")
# Load and process each CSV file to calculate HRV metrics
all_hrv_results = []
failed_sessions = []
# CSV files to process
csv_files = [
'T01_Mara.csv', 'T02_Laura.csv', 'T03_Nancy.csv', 'T04_Michelle.csv',
'T05_Felicitas.csv', 'T06_Mara_Selena.csv', 'T07_Geraldinn.csv', 'T08_Karina.csv'
]
data_dir = r'C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder'
for csv_file in csv_files:
filepath = os.path.join(data_dir, csv_file)
subject_id = csv_file.replace('.csv', '').replace('_', ' ')
print(f"\n--- Processing {subject_id} ---")
if not os.path.exists(filepath):
print(f" ✗ File not found: {filepath}")
continue
try:
# Read the CSV file in chunks to handle large files
print(" Loading data...")
chunk_list = []
chunksize = 50000 # Read in chunks of 50k rows
for chunk in pd.read_csv(filepath, chunksize=chunksize):
chunk_list.append(chunk)
df = pd.concat(chunk_list, ignore_index=True)
print(f" ✓ Loaded {len(df)} rows")
# Find heart rate column
hr_column = None
for col in df.columns:
if any(keyword in col.lower() for keyword in ['heart_rate', 'heart rate', 'hr', 'bpm']):
hr_column = col
break
if hr_column is None:
print(" ✗ No heart rate column found")
failed_sessions.append(f"{subject_id}: No HR column")
continue
print(f" ✓ Found heart rate column: {hr_column}")
# Group by Sol (session/day) and calculate HRV for each session
if 'Sol' in df.columns:
sessions = df['Sol'].unique()
sessions = sessions[~pd.isna(sessions)] # Remove NaN sessions
print(f" Found {len(sessions)} sessions: {sorted(sessions)}")
for session in sessions:
session_data = df[df['Sol'] == session]
hr_data = session_data[hr_column]
# Remove NaN values and ensure we have enough data
hr_clean = hr_data.dropna()
if len(hr_clean) < 100: # Minimum points for meaningful HRV
print(f" Session {session}: Insufficient HR data ({len(hr_clean)} points)")
failed_sessions.append(f"{subject_id} Sol {session}: Insufficient data")
continue
print(f" Session {session}: Processing {len(hr_clean)} HR values")
# Calculate HRV metrics
hrv_results = calculate_comprehensive_hrv(hr_clean, subject_id, session)
if hrv_results: # If successful
all_hrv_results.append(hrv_results)
print(f" ✓ HRV calculated for Session {session}")
else:
print(f" ✗ HRV calculation failed for Session {session}")
failed_sessions.append(f"{subject_id} Sol {session}: Calculation failed")
else:
print(" ✗ No 'Sol' column found - cannot separate sessions")
failed_sessions.append(f"{subject_id}: No Sol column")
except Exception as e:
print(f" ✗ Error processing {csv_file}: {str(e)[:100]}...")
failed_sessions.append(f"{subject_id}: Processing error")
print(f"\n=== PROCESSING SUMMARY ===")
print(f"Successfully calculated HRV for: {len(all_hrv_results)} sessions")
print(f"Failed sessions: {len(failed_sessions)}")
if failed_sessions:
print("\nFailed sessions details:")
for failure in failed_sessions:
print(f" - {failure}")
# Create HRV dataframe
if all_hrv_results:
hrv_df = pd.DataFrame(all_hrv_results)
print(f"\n=== HRV DATASET CREATED ===")
print(f"Shape: {hrv_df.shape}")
print(f"Subjects: {hrv_df['Subject'].nunique()}")
print(f"Sessions: {len(hrv_df)}")
print(f"Available HRV metrics: {[col for col in hrv_df.columns if col not in ['Subject', 'Session', 'RR_Count']]}")
# Show sample of calculated HRV metrics
print(f"\n=== SAMPLE HRV DATA ===")
sample_metrics = ['Subject', 'Session', 'SDNN', 'RMSSD', 'LF_Power', 'HF_Power', 'LF_HF_Ratio', 'SD1', 'SD2']
available_sample = [col for col in sample_metrics if col in hrv_df.columns]
print(hrv_df[available_sample].head(10))
# Save HRV results for future use
output_path = os.path.join(data_dir, 'calculated_hrv_metrics.csv')
hrv_df.to_csv(output_path, index=False)
print(f"\n✓ HRV metrics saved to: {output_path}")
# Update variables for downstream analysis
sympathetic_df = hrv_df.copy()
# Rename columns to match original analysis expectations
column_mapping = {
'Session': 'Sol',
'LF_Power': 'LF_Power',
'HF_Power': 'HF_Power',
'LF_HF_Ratio': 'LF_HF_Ratio',
'LF_Normalized': 'LF_Normalized',
'HF_Normalized': 'HF_Normalized',
'VLF_Power': 'VLF_Power',
'Total_Power': 'Total_Power',
'SDNN': 'SDNN',
'RMSSD': 'RMSSD',
'SD1': 'SD1',
'SD2': 'SD2'
}
sympathetic_df = sympathetic_df.rename(columns=column_mapping)
# Define metrics for sympathetic analysis
available_metrics = {k: k for k, v in column_mapping.items() if k in sympathetic_df.columns}
numeric_metrics = [col for col in sympathetic_df.columns if col not in ['Subject', 'Sol', 'RR_Count']]
print(f"\n✓ Dataset ready for sympathetic ANS analysis")
print(f"Available sympathetic metrics: {list(available_metrics.keys())}")
else:
print("\n✗ No HRV metrics could be calculated")
print("Please check the data format and heart rate column availability")
=== PROCESSING DATA TO CALCULATE HRV METRICS ===
--- Processing T01 Mara ---
Loading data...
✓ Loaded 648029 rows
✓ Found heart rate column: heart_rate [bpm]
Found 8 sessions: [2, 3, 6, 7, 10, 12, 15, 16]
Session 2: Processing 32806 HR values
Calculating HRV for T01 Mara, Session 2
Processing 32806 RR intervals
✓ HRV calculated for Session 2
Session 3: Processing 12337 HR values
Calculating HRV for T01 Mara, Session 3
Processing 12337 RR intervals
✓ HRV calculated for Session 3
Session 10: Processing 63536 HR values
Calculating HRV for T01 Mara, Session 10
Processing 63536 RR intervals
✓ HRV calculated for Session 10
Session 15: Processing 42699 HR values
Calculating HRV for T01 Mara, Session 15
Processing 42699 RR intervals
✓ HRV calculated for Session 15
Session 16: Processing 36716 HR values
Calculating HRV for T01 Mara, Session 16
Processing 36716 RR intervals
✓ HRV calculated for Session 16
Session 6: Processing 2078 HR values
Calculating HRV for T01 Mara, Session 6
Processing 2078 RR intervals
✓ HRV calculated for Session 6
Session 7: Processing 4600 HR values
Calculating HRV for T01 Mara, Session 7
Processing 4600 RR intervals
✓ HRV calculated for Session 7
Session 12: Processing 65296 HR values
Calculating HRV for T01 Mara, Session 12
Processing 65296 RR intervals
✓ HRV calculated for Session 12
--- Processing T02 Laura ---
Loading data...
✓ Loaded 233918 rows
✓ Found heart rate column: heart_rate [bpm]
Found 3 sessions: [2, 3, 9]
Session 2: Processing 30995 HR values
Calculating HRV for T02 Laura, Session 2
Processing 30995 RR intervals
✓ HRV calculated for Session 2
Session 3: Processing 22701 HR values
Calculating HRV for T02 Laura, Session 3
Processing 22701 RR intervals
✓ HRV calculated for Session 3
Session 9: Processing 45640 HR values
Calculating HRV for T02 Laura, Session 9
Processing 45640 RR intervals
✓ HRV calculated for Session 9
--- Processing T03 Nancy ---
Loading data...
✓ Loaded 126588 rows
✓ Found heart rate column: heart_rate [bpm]
Found 4 sessions: [4, 10, 11, 14]
Session 4: Processing 34799 HR values
Calculating HRV for T03 Nancy, Session 4
Processing 34799 RR intervals
✓ HRV calculated for Session 4
Session 10: Processing 33849 HR values
Calculating HRV for T03 Nancy, Session 10
Processing 33849 RR intervals
✓ HRV calculated for Session 10
Session 11: Processing 30883 HR values
Calculating HRV for T03 Nancy, Session 11
Processing 30883 RR intervals
✓ HRV calculated for Session 11
Session 14: Processing 27049 HR values
Calculating HRV for T03 Nancy, Session 14
Processing 27049 RR intervals
✓ HRV calculated for Session 14
--- Processing T04 Michelle ---
Loading data...
✓ Loaded 89442 rows
✓ Found heart rate column: heart_rate [bpm]
Found 4 sessions: [4, 9, 13, 14]
Session 4: Processing 21834 HR values
Calculating HRV for T04 Michelle, Session 4
Processing 21834 RR intervals
✓ HRV calculated for Session 4
Session 9: Processing 31346 HR values
Calculating HRV for T04 Michelle, Session 9
Processing 31346 RR intervals
✓ HRV calculated for Session 9
Session 13: Processing 29015 HR values
Calculating HRV for T04 Michelle, Session 13
Processing 29015 RR intervals
✓ HRV calculated for Session 13
Session 14: Processing 7239 HR values
Calculating HRV for T04 Michelle, Session 14
Processing 7239 RR intervals
✓ HRV calculated for Session 14
--- Processing T05 Felicitas ---
Loading data...
✓ Loaded 173434 rows
✓ Found heart rate column: heart_rate [bpm]
Found 6 sessions: [2, 4, 6, 9, 13, 14]
Session 2: Processing 35341 HR values
Calculating HRV for T05 Felicitas, Session 2
Processing 35341 RR intervals
✓ HRV calculated for Session 2
Session 4: Processing 44233 HR values
Calculating HRV for T05 Felicitas, Session 4
Processing 44233 RR intervals
✓ HRV calculated for Session 4
Session 6: Processing 24755 HR values
Calculating HRV for T05 Felicitas, Session 6
Processing 24755 RR intervals
✓ HRV calculated for Session 6
Session 9: Processing 36326 HR values
Calculating HRV for T05 Felicitas, Session 9
Processing 36326 RR intervals
✓ HRV calculated for Session 9
Session 13: Processing 28365 HR values
Calculating HRV for T05 Felicitas, Session 13
Processing 28365 RR intervals
✓ HRV calculated for Session 13
Session 14: Processing 4402 HR values
Calculating HRV for T05 Felicitas, Session 14
Processing 4402 RR intervals
✓ HRV calculated for Session 14
--- Processing T06 Mara Selena ---
Loading data...
✓ Loaded 144295 rows
✓ Found heart rate column: heart_rate [bpm]
Found 6 sessions: [3, 5, 10, 11, 12, 14]
Session 3: Processing 33113 HR values
Calculating HRV for T06 Mara Selena, Session 3
Processing 33113 RR intervals
✓ HRV calculated for Session 3
Session 5: Processing 39842 HR values
Calculating HRV for T06 Mara Selena, Session 5
Processing 39842 RR intervals
✓ HRV calculated for Session 5
Session 10: Processing 5036 HR values
Calculating HRV for T06 Mara Selena, Session 10
Processing 5036 RR intervals
✓ HRV calculated for Session 10
Session 11: Processing 22907 HR values
Calculating HRV for T06 Mara Selena, Session 11
Processing 22907 RR intervals
✓ HRV calculated for Session 11
Session 12: Processing 13604 HR values
Calculating HRV for T06 Mara Selena, Session 12
Processing 13604 RR intervals
✓ HRV calculated for Session 12
Session 14: Processing 29781 HR values
Calculating HRV for T06 Mara Selena, Session 14
Processing 29781 RR intervals
✓ HRV calculated for Session 14
--- Processing T07 Geraldinn ---
Loading data...
✓ Loaded 94301 rows
✓ Found heart rate column: heart_rate [bpm]
Found 4 sessions: [3, 5, 6, 12]
Session 3: Processing 24549 HR values
Calculating HRV for T07 Geraldinn, Session 3
Processing 24549 RR intervals
✓ HRV calculated for Session 3
Session 5: Processing 29711 HR values
Calculating HRV for T07 Geraldinn, Session 5
Processing 29711 RR intervals
✓ HRV calculated for Session 5
Session 6: Processing 24615 HR values
Calculating HRV for T07 Geraldinn, Session 6
Processing 24615 RR intervals
✓ HRV calculated for Session 6
Session 12: Processing 15418 HR values
Calculating HRV for T07 Geraldinn, Session 12
Processing 15418 RR intervals
✓ HRV calculated for Session 12
--- Processing T08 Karina ---
Loading data...
✓ Loaded 57872 rows
✓ Found heart rate column: heart_rate [bpm]
Found 2 sessions: [3, 12]
Session 3: Processing 32919 HR values
Calculating HRV for T08 Karina, Session 3
Processing 32919 RR intervals
✓ HRV calculated for Session 3
Session 12: Processing 24949 HR values
Calculating HRV for T08 Karina, Session 12
Processing 24949 RR intervals
✓ HRV calculated for Session 12
=== PROCESSING SUMMARY ===
Successfully calculated HRV for: 37 sessions
Failed sessions: 0
=== HRV DATASET CREATED ===
Shape: (37, 21)
Subjects: 8
Sessions: 37
Available HRV metrics: ['SDNN', 'RMSSD', 'pNN50', 'HR_mean', 'HR_std', 'RR_mean', 'RR_std', 'VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_Normalized', 'HF_Normalized', 'LF_HF_Ratio', 'SD1', 'SD2', 'SD1_SD2_Ratio', 'Ellipse_Area']
=== SAMPLE HRV DATA ===
Subject Session SDNN RMSSD LF_Power HF_Power \
0 T01 Mara 2 108.528914 14.772640 223.822775 71.086921
1 T01 Mara 3 110.298270 11.806051 193.929879 14.592620
2 T01 Mara 10 171.322839 11.563091 135.054947 36.601861
3 T01 Mara 15 94.245965 15.383562 325.020931 74.990655
4 T01 Mara 16 105.316911 7.155712 205.084748 16.570475
5 T01 Mara 6 90.495863 10.893747 184.442797 11.306659
6 T01 Mara 7 88.777988 15.245083 203.480996 33.402660
7 T01 Mara 12 190.873279 12.166433 169.850122 46.961644
8 T02 Laura 2 124.214879 9.537965 48.276945 12.576970
9 T02 Laura 3 130.671716 9.879465 243.402890 14.862179
LF_HF_Ratio SD1 SD2
0 3.148579 10.445993 153.124982
1 13.289586 8.348474 155.752651
2 3.689838 8.176404 242.148984
3 4.334152 10.877948 132.837078
4 12.376516 5.059920 148.847856
5 16.312758 7.704886 127.777812
6 6.091760 10.780911 124.979032
7 3.616784 8.603033 269.799877
8 3.838520 6.744467 175.525013
9 16.377335 6.985981 184.660942
✓ HRV metrics saved to: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\calculated_hrv_metrics.csv
✓ Dataset ready for sympathetic ANS analysis
Available sympathetic metrics: ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'HF_Normalized', 'VLF_Power', 'Total_Power', 'SDNN', 'RMSSD', 'SD1', 'SD2']
# Cell 12: Comprehensive Sympathetic ANS Analysis with Calculated HRV Metrics
print("=== COMPREHENSIVE SYMPATHETIC ANS ANALYSIS ===")
if 'sympathetic_df' in locals() and len(sympathetic_df) > 0:
print(f"Analyzing {len(sympathetic_df)} HRV measurements from {sympathetic_df['Subject'].nunique()} subjects")
print(f"Sessions per subject: {sympathetic_df.groupby('Subject').size().to_dict()}")
# 1. DESCRIPTIVE STATISTICS
print("\n=== DESCRIPTIVE STATISTICS FOR SYMPATHETIC HRV METRICS ===")
sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
available_sympathetic = [m for m in sympathetic_metrics if m in sympathetic_df.columns]
if available_sympathetic:
desc_stats = sympathetic_df[available_sympathetic].describe()
print(desc_stats.round(3))
# 2. NORMALITY TESTING
print("\n=== NORMALITY TESTING ===")
from scipy.stats import shapiro, normaltest
for metric in available_sympathetic:
data = sympathetic_df[metric].dropna()
if len(data) >= 3:
shapiro_stat, shapiro_p = shapiro(data)
print(f"{metric}: Shapiro-Wilk p = {shapiro_p:.4f} ({'Normal' if shapiro_p > 0.05 else 'Non-normal'})")
# 3. TEMPORAL ANALYSIS
print("\n=== TEMPORAL TRENDS ANALYSIS ===")
if 'Sol' in sympathetic_df.columns:
temporal_results = []
for metric in available_sympathetic:
temporal_data = sympathetic_df.groupby('Sol')[metric].mean().reset_index()
if len(temporal_data) >= 3:
corr_r, corr_p = pearsonr(temporal_data['Sol'], temporal_data[metric])
temporal_results.append({
'Metric': metric,
'Correlation_r': corr_r,
'P_value': corr_p,
'Significant': 'Yes' if corr_p < 0.05 else 'No'
})
print(f"{metric}: r = {corr_r:.3f}, p = {corr_p:.4f} ({'Significant' if corr_p < 0.05 else 'Not significant'})")
# 4. CORRELATION ANALYSIS
print("\n=== CORRELATION ANALYSIS BETWEEN SYMPATHETIC METRICS ===")
correlation_matrix = sympathetic_df[available_sympathetic].corr()
print("Correlation Matrix:")
print(correlation_matrix.round(3))
# Significant correlations
print("\nSignificant correlations (p < 0.05):")
from scipy.stats import pearsonr
for i, metric1 in enumerate(available_sympathetic):
for j, metric2 in enumerate(available_sympathetic):
if i < j:
paired_data = sympathetic_df[[metric1, metric2]].dropna()
if len(paired_data) >= 3:
r, p = pearsonr(paired_data[metric1], paired_data[metric2])
if p < 0.05:
effect_size = "Large" if abs(r) >= 0.5 else "Medium" if abs(r) >= 0.3 else "Small"
print(f" {metric1} - {metric2}: r = {r:.3f}, p = {p:.4f} ({effect_size} effect)")
# 5. VISUALIZATION
print("\n=== CREATING VISUALIZATIONS ===")
# Create comprehensive visualization
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
axes = axes.flatten()
# Plot 1: Box plots by subject for LF/HF ratio
if 'LF_HF_Ratio' in sympathetic_df.columns:
sns.boxplot(data=sympathetic_df, x='Subject', y='LF_HF_Ratio', ax=axes[0])
axes[0].set_title('LF/HF Ratio by Subject (Sympathovagal Balance)', fontweight='bold')
axes[0].tick_params(axis='x', rotation=45)
axes[0].grid(True, alpha=0.3)
# Plot 2: Temporal trends for key sympathetic metric
key_metric = 'LF_HF_Ratio' if 'LF_HF_Ratio' in sympathetic_df.columns else available_sympathetic[0]
if 'Sol' in sympathetic_df.columns:
temporal_means = sympathetic_df.groupby('Sol')[key_metric].mean().reset_index()
temporal_stds = sympathetic_df.groupby('Sol')[key_metric].std().reset_index()
axes[1].errorbar(temporal_means['Sol'], temporal_means[key_metric],
yerr=temporal_stds[key_metric], marker='o', capsize=5)
axes[1].set_title(f'{key_metric} Temporal Changes', fontweight='bold')
axes[1].set_xlabel('Recording Day (Sol)')
axes[1].grid(True, alpha=0.3)
# Plot 3: Correlation heatmap
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))
sns.heatmap(correlation_matrix, mask=mask, annot=True, cmap='RdBu_r', center=0,
ax=axes[2], square=True, fmt='.2f')
axes[2].set_title('Sympathetic Metrics Correlations', fontweight='bold')
# Plot 4: Distribution of LF Power
if 'LF_Power' in sympathetic_df.columns:
axes[3].hist(sympathetic_df['LF_Power'].dropna(), bins=20, alpha=0.7, edgecolor='black')
axes[3].set_title('LF Power Distribution', fontweight='bold')
axes[3].set_xlabel('LF Power (ms²)')
axes[3].grid(True, alpha=0.3)
# Plot 5: SDNN vs LF/HF Ratio scatter
if 'SDNN' in sympathetic_df.columns and 'LF_HF_Ratio' in sympathetic_df.columns:
scatter_data = sympathetic_df[['SDNN', 'LF_HF_Ratio']].dropna()
axes[4].scatter(scatter_data['SDNN'], scatter_data['LF_HF_Ratio'], alpha=0.6)
axes[4].set_title('SDNN vs LF/HF Ratio', fontweight='bold')
axes[4].set_xlabel('SDNN (ms)')
axes[4].set_ylabel('LF/HF Ratio')
axes[4].grid(True, alpha=0.3)
# Plot 6: Summary statistics table
# Calculate metrics safely to avoid errors
n_measurements = len(sympathetic_df)
n_subjects = sympathetic_df['Subject'].nunique()
n_sessions = sympathetic_df['Sol'].nunique() if 'Sol' in sympathetic_df.columns else 'N/A'
# Calculate key metric statistics safely
lf_hf_stats = ""
if 'LF_HF_Ratio' in sympathetic_df.columns:
lf_hf_mean = sympathetic_df['LF_HF_Ratio'].mean()
lf_hf_std = sympathetic_df['LF_HF_Ratio'].std()
lf_hf_stats = f"{lf_hf_mean:.2f} ± {lf_hf_std:.2f}"
else:
lf_hf_stats = "N/A"
lf_power_stats = ""
if 'LF_Power' in sympathetic_df.columns:
lf_power_mean = sympathetic_df['LF_Power'].mean()
lf_power_std = sympathetic_df['LF_Power'].std()
lf_power_stats = f"{lf_power_mean:.1f} ± {lf_power_std:.1f} ms²"
else:
lf_power_stats = "N/A"
sdnn_stats = ""
if 'SDNN' in sympathetic_df.columns:
sdnn_mean = sympathetic_df['SDNN'].mean()
sdnn_std = sympathetic_df['SDNN'].std()
sdnn_stats = f"{sdnn_mean:.1f} ± {sdnn_std:.1f} ms"
else:
sdnn_stats = "N/A"
# Calculate statistical findings
normal_count = 0
try:
for metric in available_sympathetic:
data = sympathetic_df[metric].dropna()
if len(data) >= 3:
_, p_value = shapiro(data)
if p_value > 0.05:
normal_count += 1
except:
normal_count = 0
temporal_sig = 'N/A'
temporal_total = 'N/A'
if 'temporal_results' in locals() and temporal_results:
temporal_sig = len([r for r in temporal_results if r['Significant'] == 'Yes'])
temporal_total = len(temporal_results)
summary_text = f"""SYMPATHETIC ANS ANALYSIS SUMMARY
Dataset Characteristics:
• Total HRV measurements: {n_measurements}
• Unique subjects: {n_subjects}
• Recording sessions: {n_sessions}
Key Sympathetic Metrics (Mean ± SD):
• LF/HF Ratio: {lf_hf_stats}
• LF Power: {lf_power_stats}
• SDNN: {sdnn_stats}
Statistical Findings:
• Normal distributions: {normal_count}/{len(available_sympathetic)}
• Temporal correlations: {temporal_sig}/{temporal_total}
Clinical Interpretation:
✓ HRV-based sympathetic assessment completed
✓ Individual and temporal variability characterized
✓ Statistical validation performed
✓ Results ready for clinical application"""
axes[5].text(0.1, 0.5, summary_text, transform=axes[5].transAxes, fontsize=10,
verticalalignment='center', family='monospace',
bbox=dict(boxstyle='round,pad=1', facecolor='lightgray', alpha=0.8))
axes[5].set_xlim(0, 1)
axes[5].set_ylim(0, 1)
axes[5].axis('off')
axes[5].set_title('Analysis Summary', fontweight='bold')
plt.tight_layout()
#plt.suptitle('Sympathetic Autonomic Nervous System Analysis\nCalculated from Heart Rate Variability',
#y=0.98, fontsize=16, fontweight='bold')
plt.show()
# 6. EXPORT RESULTS
print("\n=== EXPORTING ANALYSIS RESULTS ===")
# Save processed data
results_path = os.path.join(data_dir, 'sympathetic_ans_results.csv')
sympathetic_df.to_csv(results_path, index=False)
print(f"✓ Analysis results saved to: {results_path}")
# Create summary report
report_path = os.path.join(data_dir, 'sympathetic_analysis_report.txt')
with open(report_path, 'w') as f:
f.write("SYMPATHETIC ANS ANALYSIS REPORT\n")
f.write("="*50 + "\n\n")
f.write(f"Dataset: {len(sympathetic_df)} HRV measurements\n")
f.write(f"Subjects: {sympathetic_df['Subject'].nunique()}\n")
f.write(f"Sessions: {sympathetic_df['Sol'].nunique() if 'Sol' in sympathetic_df.columns else 'N/A'}\n\n")
f.write("DESCRIPTIVE STATISTICS:\n")
f.write(desc_stats.to_string())
f.write("\n\n")
if 'temporal_results' in locals():
f.write("TEMPORAL ANALYSIS:\n")
for result in temporal_results:
f.write(f"{result['Metric']}: r={result['Correlation_r']:.3f}, p={result['P_value']:.4f} ({result['Significant']})\n")
print(f"✓ Analysis report saved to: {report_path}")
print("\n=== ANALYSIS COMPLETED SUCCESSFULLY ===")
print("✓ HRV metrics calculated from heart rate data")
print("✓ Sympathetic ANS analysis performed")
print("✓ Statistical validation completed")
print("✓ Visualizations generated")
print("✓ Results exported")
else:
print("No sympathetic HRV metrics available for analysis")
else:
print("No HRV data available. Please run the data processing cells first.")
=== COMPREHENSIVE SYMPATHETIC ANS ANALYSIS ===
Analyzing 37 HRV measurements from 8 subjects
Sessions per subject: {'T01 Mara': 8, 'T02 Laura': 3, 'T03 Nancy': 4, 'T04 Michelle': 4, 'T05 Felicitas': 6, 'T06 Mara Selena': 6, 'T07 Geraldinn': 4, 'T08 Karina': 2}
=== DESCRIPTIVE STATISTICS FOR SYMPATHETIC HRV METRICS ===
LF_Power LF_HF_Ratio LF_Normalized SDNN SD2
count 37.000 37.000 37.000 37.000 37.000
mean 454.414 13.273 89.974 125.294 176.892
std 659.793 8.583 6.209 47.363 66.809
min 48.277 3.149 75.895 59.816 84.354
25% 203.481 6.206 86.123 90.496 127.778
50% 268.743 12.404 92.540 110.298 155.753
75% 356.840 16.377 94.245 168.817 238.624
max 3584.269 35.624 97.270 239.834 338.781
=== NORMALITY TESTING ===
LF_Power: Shapiro-Wilk p = 0.0000 (Non-normal)
LF_HF_Ratio: Shapiro-Wilk p = 0.0019 (Non-normal)
LF_Normalized: Shapiro-Wilk p = 0.0004 (Non-normal)
SDNN: Shapiro-Wilk p = 0.0045 (Non-normal)
SD2: Shapiro-Wilk p = 0.0047 (Non-normal)
=== TEMPORAL TRENDS ANALYSIS ===
LF_Power: r = 0.056, p = 0.8485 (Not significant)
LF_HF_Ratio: r = 0.295, p = 0.3065 (Not significant)
LF_Normalized: r = 0.161, p = 0.5822 (Not significant)
SDNN: r = -0.291, p = 0.3120 (Not significant)
SD2: r = -0.292, p = 0.3108 (Not significant)
=== CORRELATION ANALYSIS BETWEEN SYMPATHETIC METRICS ===
Correlation Matrix:
LF_Power LF_HF_Ratio LF_Normalized SDNN SD2
LF_Power 1.000 -0.227 -0.252 0.475 0.471
LF_HF_Ratio -0.227 1.000 0.831 -0.684 -0.684
LF_Normalized -0.252 0.831 1.000 -0.648 -0.648
SDNN 0.475 -0.684 -0.648 1.000 1.000
SD2 0.471 -0.684 -0.648 1.000 1.000
Significant correlations (p < 0.05):
LF_Power - SDNN: r = 0.475, p = 0.0030 (Medium effect)
LF_Power - SD2: r = 0.471, p = 0.0033 (Medium effect)
LF_HF_Ratio - LF_Normalized: r = 0.831, p = 0.0000 (Large effect)
LF_HF_Ratio - SDNN: r = -0.684, p = 0.0000 (Large effect)
LF_HF_Ratio - SD2: r = -0.684, p = 0.0000 (Large effect)
LF_Normalized - SDNN: r = -0.648, p = 0.0000 (Large effect)
LF_Normalized - SD2: r = -0.648, p = 0.0000 (Large effect)
SDNN - SD2: r = 1.000, p = 0.0000 (Large effect)
=== CREATING VISUALIZATIONS ===
=== EXPORTING ANALYSIS RESULTS === ✓ Analysis results saved to: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\sympathetic_ans_results.csv ✓ Analysis report saved to: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\sympathetic_analysis_report.txt === ANALYSIS COMPLETED SUCCESSFULLY === ✓ HRV metrics calculated from heart rate data ✓ Sympathetic ANS analysis performed ✓ Statistical validation completed ✓ Visualizations generated ✓ Results exported
# Cell 13: Complete Analysis Execution (Run after Cell 11)
print("=== COMPLETE SYMPATHETIC ANS ANALYSIS EXECUTION ===")
print("This cell runs the complete analysis workflow after HRV metrics are calculated")
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
print(f"\n✓ Found HRV data: {len(sympathetic_df)} measurements from {sympathetic_df['Subject'].nunique()} subjects")
# Update numeric_cols for analysis
numeric_metrics = [col for col in sympathetic_df.columns
if col not in ['Subject', 'Sol', 'RR_Count']
and pd.api.types.is_numeric_dtype(sympathetic_df[col])]
print(f"✓ Available metrics: {numeric_metrics}")
# Execute all analysis components
print("\n" + "="*50)
print("EXECUTING COMPLETE SYMPATHETIC ANS ANALYSIS")
print("="*50)
# 1. DESCRIPTIVE STATISTICS
print("\n1. DESCRIPTIVE STATISTICS")
sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
available_sympathetic = [m for m in sympathetic_metrics if m in sympathetic_df.columns]
if available_sympathetic:
desc_stats = sympathetic_df[available_sympathetic].describe()
print(desc_stats.round(3))
# 2. NORMALITY TESTING
print("\n2. NORMALITY TESTING")
normality_results = []
for metric in available_sympathetic:
data = sympathetic_df[metric].dropna()
if len(data) >= 3:
try:
shapiro_stat, shapiro_p = shapiro(data)
normality_results.append({
'Metric': metric,
'N': len(data),
'Shapiro_p': shapiro_p,
'Normal': 'Yes' if shapiro_p > 0.05 else 'No'
})
print(f"{metric}: Shapiro-Wilk p = {shapiro_p:.4f} ({'Normal' if shapiro_p > 0.05 else 'Non-normal'})")
except Exception as e:
print(f"{metric}: Error in normality test - {str(e)}")
# 3. TEMPORAL ANALYSIS
print("\n3. TEMPORAL TRENDS ANALYSIS")
if 'Sol' in sympathetic_df.columns:
temporal_results = []
for metric in available_sympathetic:
temporal_data = sympathetic_df.groupby('Sol')[metric].mean().reset_index()
if len(temporal_data) >= 3:
corr_r, corr_p = pearsonr(temporal_data['Sol'], temporal_data[metric])
temporal_results.append({
'Metric': metric,
'Correlation_r': corr_r,
'P_value': corr_p,
'Significant': 'Yes' if corr_p < 0.05 else 'No'
})
print(f"{metric}: r = {corr_r:.3f}, p = {corr_p:.4f} ({'Significant' if corr_p < 0.05 else 'Not significant'})")
# 4. CORRELATION ANALYSIS
print("\n4. INTER-METRIC CORRELATIONS")
correlation_results = []
for i, metric1 in enumerate(available_sympathetic):
for j, metric2 in enumerate(available_sympathetic):
if i < j:
paired_data = sympathetic_df[[metric1, metric2]].dropna()
if len(paired_data) >= 3:
try:
r, p = pearsonr(paired_data[metric1], paired_data[metric2])
if p < 0.05:
effect_size = "Large" if abs(r) >= 0.5 else "Medium" if abs(r) >= 0.3 else "Small"
correlation_results.append({
'Metric_1': metric1, 'Metric_2': metric2,
'Correlation_r': r, 'P_value': p, 'Effect_Size': effect_size
})
print(f" {metric1} - {metric2}: r = {r:.3f}, p = {p:.4f} ({effect_size} effect)")
except Exception as e:
print(f" Error in {metric1}-{metric2} correlation: {str(e)}")
# 5. SUMMARY RESULTS
print(f"\n" + "="*50)
print("ANALYSIS SUMMARY")
print("="*50)
print(f"• Dataset: {len(sympathetic_df)} HRV measurements")
print(f"• Subjects: {sympathetic_df['Subject'].nunique()}")
print(f"• Sessions: {sympathetic_df['Sol'].nunique() if 'Sol' in sympathetic_df.columns else 'N/A'}")
# Key findings
normal_metrics = len([r for r in normality_results if r['Normal'] == 'Yes']) if normality_results else 0
sig_temporal = len([r for r in temporal_results if r['Significant'] == 'Yes']) if 'temporal_results' in locals() and temporal_results else 0
sig_correlations = len(correlation_results)
print(f"• Normal distributions: {normal_metrics}/{len(available_sympathetic)}")
print(f"• Significant temporal trends: {sig_temporal}")
print(f"• Significant correlations: {sig_correlations}")
# Export results
if len(sympathetic_df) > 0:
results_path = os.path.join(data_dir, 'sympathetic_ans_results.csv')
sympathetic_df.to_csv(results_path, index=False)
print(f"\n✓ Results exported to: {results_path}")
print(f"\n✓ SYMPATHETIC ANS ANALYSIS COMPLETED SUCCESSFULLY")
else:
print("✗ No sympathetic metrics available for analysis")
else:
print("\n✗ No HRV data available. Please run Cell 11 first to calculate HRV metrics.")
print("The analysis workflow requires calculated HRV metrics from heart rate data.")
=== COMPLETE SYMPATHETIC ANS ANALYSIS EXECUTION ===
This cell runs the complete analysis workflow after HRV metrics are calculated
✓ Found HRV data: 37 measurements from 8 subjects
✓ Available metrics: ['SDNN', 'RMSSD', 'pNN50', 'HR_mean', 'HR_std', 'RR_mean', 'RR_std', 'VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_Normalized', 'HF_Normalized', 'LF_HF_Ratio', 'SD1', 'SD2', 'SD1_SD2_Ratio', 'Ellipse_Area']
==================================================
EXECUTING COMPLETE SYMPATHETIC ANS ANALYSIS
==================================================
1. DESCRIPTIVE STATISTICS
LF_Power LF_HF_Ratio LF_Normalized SDNN SD2
count 37.000 37.000 37.000 37.000 37.000
mean 454.414 13.273 89.974 125.294 176.892
std 659.793 8.583 6.209 47.363 66.809
min 48.277 3.149 75.895 59.816 84.354
25% 203.481 6.206 86.123 90.496 127.778
50% 268.743 12.404 92.540 110.298 155.753
75% 356.840 16.377 94.245 168.817 238.624
max 3584.269 35.624 97.270 239.834 338.781
2. NORMALITY TESTING
LF_Power: Shapiro-Wilk p = 0.0000 (Non-normal)
LF_HF_Ratio: Shapiro-Wilk p = 0.0019 (Non-normal)
LF_Normalized: Shapiro-Wilk p = 0.0004 (Non-normal)
SDNN: Shapiro-Wilk p = 0.0045 (Non-normal)
SD2: Shapiro-Wilk p = 0.0047 (Non-normal)
3. TEMPORAL TRENDS ANALYSIS
LF_Power: r = 0.056, p = 0.8485 (Not significant)
LF_HF_Ratio: r = 0.295, p = 0.3065 (Not significant)
LF_Normalized: r = 0.161, p = 0.5822 (Not significant)
SDNN: r = -0.291, p = 0.3120 (Not significant)
SD2: r = -0.292, p = 0.3108 (Not significant)
4. INTER-METRIC CORRELATIONS
LF_Power - SDNN: r = 0.475, p = 0.0030 (Medium effect)
LF_Power - SD2: r = 0.471, p = 0.0033 (Medium effect)
LF_HF_Ratio - LF_Normalized: r = 0.831, p = 0.0000 (Large effect)
LF_HF_Ratio - SDNN: r = -0.684, p = 0.0000 (Large effect)
LF_HF_Ratio - SD2: r = -0.684, p = 0.0000 (Large effect)
LF_Normalized - SDNN: r = -0.648, p = 0.0000 (Large effect)
LF_Normalized - SD2: r = -0.648, p = 0.0000 (Large effect)
SDNN - SD2: r = 1.000, p = 0.0000 (Large effect)
==================================================
ANALYSIS SUMMARY
==================================================
• Dataset: 37 HRV measurements
• Subjects: 8
• Sessions: 14
• Normal distributions: 0/5
• Significant temporal trends: 0
• Significant correlations: 8
✓ Results exported to: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\sympathetic_ans_results.csv
✓ SYMPATHETIC ANS ANALYSIS COMPLETED SUCCESSFULLY
# Load existing sympathetic analysis results and define required variables
import pandas as pd
import numpy as np
from scipy import stats
# Load the existing sympathetic analysis results
sympathetic_df = pd.read_csv('../sympathetic_ans_results.csv')
print(f"✓ Loaded sympathetic analysis results: {len(sympathetic_df)} rows, {len(sympathetic_df.columns)} columns")
print(f"✓ Columns: {list(sympathetic_df.columns)}")
# Basic dataset statistics
total_sessions = len(sympathetic_df)
unique_subjects = sympathetic_df['Subject'].nunique() if 'Subject' in sympathetic_df.columns else 0
unique_sols = sympathetic_df['Sol'].nunique() if 'Sol' in sympathetic_df.columns else 0
if 'Sol' in sympathetic_df.columns:
sol_range = f"{sympathetic_df['Sol'].min()}-{sympathetic_df['Sol'].max()}"
else:
sol_range = "Unknown"
# Define available metrics for analysis
sympathetic_metrics = ['SDNN', 'RMSSD', 'LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SD1', 'SD2']
available_metrics = [m for m in sympathetic_metrics if m in sympathetic_df.columns]
# Count significant trends (simplified version)
significant_trends = 0
if 'Sol' in sympathetic_df.columns:
for metric in available_metrics:
if sympathetic_df[metric].notna().sum() > 5:
try:
# Simple correlation with Sol number as proxy for temporal trend
correlation, p_value = stats.spearmanr(sympathetic_df['Sol'].dropna(),
sympathetic_df[metric].dropna())
if p_value < 0.05:
significant_trends += 1
except:
continue
# Count strong correlations
strong_correlations = 0
if len(available_metrics) >= 2:
try:
correlation_matrix = sympathetic_df[available_metrics].corr(method='spearman')
# Count correlations with |r| >= 0.5 and exclude diagonal
for i in range(len(correlation_matrix.columns)):
for j in range(i+1, len(correlation_matrix.columns)):
corr_val = correlation_matrix.iloc[i, j]
if not pd.isna(corr_val) and abs(corr_val) >= 0.5:
strong_correlations += 1
except:
strong_correlations = 0
print(f"\n✓ Variables defined successfully:")
print(f"- total_sessions: {total_sessions}")
print(f"- unique_subjects: {unique_subjects}")
print(f"- unique_sols: {unique_sols}")
print(f"- sol_range: {sol_range}")
print(f"- available_metrics: {len(available_metrics)} metrics: {available_metrics}")
print(f"- significant_trends: {significant_trends}")
print(f"- strong_correlations: {strong_correlations}")
# Show sample of the data
print(f"\n✓ Sample of sympathetic_df:")
print(sympathetic_df.head())
✓ Loaded sympathetic analysis results: 37 rows, 21 columns
✓ Columns: ['SDNN', 'RMSSD', 'pNN50', 'HR_mean', 'HR_std', 'RR_mean', 'RR_std', 'VLF_Power', 'LF_Power', 'HF_Power', 'Total_Power', 'LF_Normalized', 'HF_Normalized', 'LF_HF_Ratio', 'SD1', 'SD2', 'SD1_SD2_Ratio', 'Ellipse_Area', 'Subject', 'Sol', 'RR_Count']
✓ Variables defined successfully:
- total_sessions: 37
- unique_subjects: 8
- unique_sols: 14
- sol_range: 2-16
- available_metrics: 8 metrics: ['SDNN', 'RMSSD', 'LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SD1', 'SD2']
- significant_trends: 0
- strong_correlations: 22
✓ Sample of sympathetic_df:
SDNN RMSSD pNN50 HR_mean HR_std RR_mean \
0 108.528914 14.772640 0.774272 88.731455 15.284720 676.197634
1 110.298270 11.806051 0.494488 94.616818 20.337990 634.136732
2 171.322839 11.563091 0.265995 69.231403 17.599060 866.658733
3 94.245965 15.383562 0.264649 91.951276 13.236271 652.519496
4 105.316911 7.155712 0.136184 106.669736 23.491148 562.483815
RR_std VLF_Power LF_Power HF_Power ... LF_Normalized \
0 108.528914 1280.199890 223.822775 71.086921 ... 75.895360
1 110.298270 1609.190018 193.929879 14.592620 ... 93.001896
2 171.322839 3711.328108 135.054947 36.601861 ... 78.677303
3 94.245965 1515.813095 325.020931 74.990655 ... 81.252879
4 105.316911 1170.628318 205.084748 16.570475 ... 92.524212
HF_Normalized LF_HF_Ratio SD1 SD2 SD1_SD2_Ratio \
0 24.104640 3.148579 10.445993 153.124982 0.068219
1 6.998104 13.289586 8.348474 155.752651 0.053601
2 21.322697 3.689838 8.176404 242.148984 0.033766
3 18.747121 4.334152 10.877948 132.837078 0.081889
4 7.475788 12.376516 5.059920 148.847856 0.033994
Ellipse_Area Subject Sol RR_Count
0 5025.110948 T01 Mara 2 32806
1 4085.003417 T01 Mara 3 12337
2 6220.063976 T01 Mara 10 63536
3 4539.585002 T01 Mara 15 42699
4 2366.116516 T01 Mara 16 36716
[5 rows x 21 columns]
# Cell 14: Comprehensive SOL-to-SOL Analysis with Complete Results Summary
print("=== COMPREHENSIVE SOL-TO-SOL SYMPATHETIC ANS ANALYSIS ===")
print("Detailed analysis of metrics behavior across recording sessions with complete conclusions")
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
# Enhanced error handling and data validation
print(f"\n✓ Dataset validated: {len(sympathetic_df)} measurements from {sympathetic_df['Subject'].nunique()} subjects")
# Define sympathetic metrics for analysis
sympathetic_metrics = ['LF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2', 'VLF_Power', 'Total_Power']
available_metrics = [m for m in sympathetic_metrics if m in sympathetic_df.columns]
# 1. SOL-BY-SOL DETAILED STATISTICS
print(f"\n{'='*60}")
print("SOL-BY-SOL SYMPATHETIC METRICS ANALYSIS")
print(f"{'='*60}")
sol_summary = {}
for sol in sorted(sympathetic_df['Sol'].unique()):
# Explicitly ensure we're working with a DataFrame
sol_data = sympathetic_df[sympathetic_df['Sol'] == sol].copy()
# Safe access to DataFrame methods
n_subjects = len(sol_data['Subject'].unique()) if 'Subject' in sol_data.columns else 0
unique_subjects = sol_data['Subject'].unique().tolist() if 'Subject' in sol_data.columns else []
sol_summary[sol] = {
'n_measurements': len(sol_data),
'n_subjects': n_subjects,
'subjects': unique_subjects
}
print(f"\nSOL {int(sol)} Summary:")
print(f" • Measurements: {len(sol_data)}")
if 'Subject' in sol_data.columns:
print(f" • Subjects: {n_subjects} ({', '.join(unique_subjects)})")
# Calculate metrics statistics for this SOL
for metric in available_metrics:
if metric in sol_data.columns:
metric_series = sol_data[metric]
metric_data = metric_series[metric_series.notna()]
if len(metric_data) > 0:
mean_val = metric_data.mean()
std_val = metric_data.std()
median_val = metric_data.median()
print(f" • {metric}: {mean_val:.2f} ± {std_val:.2f} (median: {median_val:.2f})")
# 2. COMPREHENSIVE SOL-TO-SOL VISUALIZATIONS
print(f"\n{'='*60}")
print("CREATING SOL-TO-SOL VISUALIZATIONS")
print(f"{'='*60}")
# Create comprehensive visualization dashboard
fig = plt.figure(figsize=(20, 24))
# Plot 1: SOL-to-SOL trends for all key metrics
plt.subplot(4, 3, 1)
colors = plt.cm.Set1(np.linspace(0, 1, len(available_metrics)))
for i, metric in enumerate(available_metrics[:5]): # Limit to 5 for clarity
sol_means = sympathetic_df.groupby('Sol')[metric].mean()
plt.plot(sol_means.index, sol_means.values, 'o-', label=metric,
color=colors[i], linewidth=2, markersize=6)
plt.title('Sympathetic Metrics Across SOLs\n(Mean Values)', fontweight='bold', fontsize=12)
plt.xlabel('SOL (Recording Day)')
plt.ylabel('Metric Value')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
# Plot 2: LF/HF Ratio detailed SOL analysis
plt.subplot(4, 3, 2)
if 'LF_HF_Ratio' in sympathetic_df.columns:
sol_data = sympathetic_df.groupby('Sol')['LF_HF_Ratio'].agg(['mean', 'std', 'count']).reset_index()
sol_data['sem'] = sol_data['std'] / np.sqrt(sol_data['count'])
plt.errorbar(sol_data['Sol'], sol_data['mean'], yerr=sol_data['sem'],
fmt='o-', capsize=5, capthick=2, linewidth=2, markersize=8, color='red')
plt.fill_between(sol_data['Sol'],
sol_data['mean'] - sol_data['std'],
sol_data['mean'] + sol_data['std'],
alpha=0.2, color='red')
plt.title('LF/HF Ratio: SOL-to-SOL Progression\n(Sympathovagal Balance)', fontweight='bold')
plt.xlabel('SOL (Recording Day)')
plt.ylabel('LF/HF Ratio')
plt.grid(True, alpha=0.3)
# Add trend line
from scipy.stats import linregress
slope, intercept, r_value, p_value, std_err = linregress(sol_data['Sol'], sol_data['mean'])
trend_line = slope * sol_data['Sol'] + intercept
plt.plot(sol_data['Sol'], trend_line, '--', color='darkred', alpha=0.8, linewidth=2)
plt.text(0.05, 0.95, f'Trend: r={r_value:.3f}, p={p_value:.3f}',
transform=plt.gca().transAxes, bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
# Plot 3: Subject-specific SOL trajectories
plt.subplot(4, 3, 3)
if 'LF_HF_Ratio' in sympathetic_df.columns:
subjects = sympathetic_df['Subject'].unique()[:6] # Limit for clarity
colors_subj = plt.cm.tab10(np.linspace(0, 1, len(subjects)))
for i, subject in enumerate(subjects):
subj_data = sympathetic_df[sympathetic_df['Subject'] == subject]
if len(subj_data) > 1:
plt.plot(subj_data['Sol'], subj_data['LF_HF_Ratio'],
'o-', label=subject, color=colors_subj[i], alpha=0.7, linewidth=1.5)
plt.title('Individual Subject Trajectories\n(LF/HF Ratio Across SOLs)', fontweight='bold')
plt.xlabel('SOL (Recording Day)')
plt.ylabel('LF/HF Ratio')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
plt.grid(True, alpha=0.3)
# Plot 4: SDNN SOL progression
plt.subplot(4, 3, 4)
if 'SDNN' in sympathetic_df.columns:
sol_data = sympathetic_df.groupby('Sol')['SDNN'].agg(['mean', 'std', 'count']).reset_index()
sol_data['sem'] = sol_data['std'] / np.sqrt(sol_data['count'])
plt.errorbar(sol_data['Sol'], sol_data['mean'], yerr=sol_data['sem'],
fmt='o-', capsize=5, capthick=2, linewidth=2, markersize=8, color='blue')
plt.title('SDNN: SOL-to-SOL Changes\n(Overall HRV)', fontweight='bold')
plt.xlabel('SOL (Recording Day)')
plt.ylabel('SDNN (ms)')
plt.grid(True, alpha=0.3)
# Plot 5: LF Power distribution by SOL
plt.subplot(4, 3, 5)
if 'LF_Power' in sympathetic_df.columns:
sol_values = sorted(sympathetic_df['Sol'].unique())
box_data = []
for sol in sol_values:
sol_lf = sympathetic_df[sympathetic_df['Sol'] == sol]['LF_Power'].dropna()
if len(sol_lf) > 0:
box_data.append(sol_lf.values)
plt.boxplot(box_data, labels=[f'SOL {int(s)}' for s in sol_values])
plt.title('LF Power Distribution Across SOLs\n(Sympathetic Activity)', fontweight='bold')
plt.xlabel('SOL (Recording Day)')
plt.ylabel('LF Power (ms²)')
plt.xticks(rotation=45)
plt.grid(True, alpha=0.3)
# Plot 6: Correlation heatmap of all metrics
plt.subplot(4, 3, 6)
correlation_matrix = sympathetic_df[available_metrics].corr()
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))
sns.heatmap(correlation_matrix, mask=mask, annot=True, cmap='RdBu_r', center=0,
square=True, fmt='.2f', cbar_kws={"shrink": .8})
plt.title('Inter-Metric Correlations\n(All Sympathetic Measures)', fontweight='bold')
# Plot 7: Normalized metrics comparison across SOLs
plt.subplot(4, 3, 7)
from sklearn.preprocessing import MinMaxScaler
metrics_to_normalize = ['LF_HF_Ratio', 'SDNN', 'LF_Normalized']
available_norm = [m for m in metrics_to_normalize if m in sympathetic_df.columns]
if len(available_norm) >= 2:
scaler = MinMaxScaler()
sol_means = sympathetic_df.groupby('Sol')[available_norm].mean()
normalized_data = pd.DataFrame(scaler.fit_transform(sol_means),
index=sol_means.index, columns=sol_means.columns)
for metric in available_norm:
plt.plot(normalized_data.index, normalized_data[metric], 'o-',
label=metric, linewidth=2, markersize=6)
plt.title('Normalized Metrics Comparison\n(0-1 Scale Across SOLs)', fontweight='bold')
plt.xlabel('SOL (Recording Day)')
plt.ylabel('Normalized Value (0-1)')
plt.legend()
plt.grid(True, alpha=0.3)
# Plot 8: Statistical significance heatmap
plt.subplot(4, 3, 8)
from scipy.stats import f_oneway
# ANOVA p-values for differences between SOLs
anova_results = {}
for metric in available_metrics:
sol_groups = []
for sol in sorted(sympathetic_df['Sol'].unique()):
sol_metric_data = sympathetic_df[sympathetic_df['Sol'] == sol][metric].dropna()
if len(sol_metric_data) > 0:
sol_groups.append(sol_metric_data.values)
if len(sol_groups) >= 2:
try:
f_stat, p_value = f_oneway(*sol_groups)
anova_results[metric] = p_value
except:
anova_results[metric] = 1.0
if anova_results:
p_values = list(anova_results.values())
metric_names = list(anova_results.keys())
# Create significance visualization
colors = ['red' if p < 0.05 else 'orange' if p < 0.1 else 'gray' for p in p_values]
bars = plt.bar(range(len(metric_names)), [-np.log10(p) for p in p_values], color=colors)
plt.axhline(y=-np.log10(0.05), color='red', linestyle='--', label='p=0.05')
plt.axhline(y=-np.log10(0.1), color='orange', linestyle='--', label='p=0.10')
plt.title('Statistical Significance\n(-log10 p-values for SOL differences)', fontweight='bold')
plt.xlabel('Sympathetic Metrics')
plt.ylabel('-log10(p-value)')
plt.xticks(range(len(metric_names)), metric_names, rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)
# Plot 9: Effect sizes across SOLs
plt.subplot(4, 3, 9)
if len(available_metrics) >= 2:
effect_sizes = []
metric_labels = []
for metric in available_metrics:
# Calculate Cohen's d for largest effect between SOLs
sols = sorted(sympathetic_df['Sol'].unique())
max_cohens_d = 0
for i in range(len(sols)):
for j in range(i+1, len(sols)):
group1 = sympathetic_df[sympathetic_df['Sol'] == sols[i]][metric].dropna()
group2 = sympathetic_df[sympathetic_df['Sol'] == sols[j]][metric].dropna()
if len(group1) > 0 and len(group2) > 0:
pooled_std = np.sqrt(((len(group1)-1)*group1.std()**2 + (len(group2)-1)*group2.std()**2) /
(len(group1) + len(group2) - 2))
if pooled_std > 0:
cohens_d = abs(group1.mean() - group2.mean()) / pooled_std
max_cohens_d = max(max_cohens_d, cohens_d)
effect_sizes.append(max_cohens_d)
metric_labels.append(metric)
colors = ['green' if d >= 0.8 else 'yellow' if d >= 0.5 else 'orange' if d >= 0.2 else 'lightgray'
for d in effect_sizes]
plt.bar(range(len(metric_labels)), effect_sizes, color=colors)
plt.axhline(y=0.8, color='green', linestyle='--', label='Large effect')
plt.axhline(y=0.5, color='gold', linestyle='--', label='Medium effect')
plt.axhline(y=0.2, color='orange', linestyle='--', label='Small effect')
plt.title("Maximum Effect Sizes\n(Cohen's d between SOLs)", fontweight='bold')
plt.xlabel('Sympathetic Metrics')
plt.ylabel("Cohen's d")
plt.xticks(range(len(metric_labels)), metric_labels, rotation=45)
plt.legend()
plt.grid(True, alpha=0.3)
# Plot 10: Data availability matrix
plt.subplot(4, 3, 10)
# Create availability matrix
availability_matrix = []
sols = sorted(sympathetic_df['Sol'].unique())
subjects = sorted(sympathetic_df['Subject'].unique())
for subject in subjects:
row = []
for sol in sols:
has_data = len(sympathetic_df[(sympathetic_df['Subject'] == subject) &
(sympathetic_df['Sol'] == sol)]) > 0
row.append(1 if has_data else 0)
availability_matrix.append(row)
plt.imshow(availability_matrix, cmap='RdYlGn', aspect='auto')
plt.title('Data Availability Matrix\n(Subjects × SOLs)', fontweight='bold')
plt.xlabel('SOL (Recording Day)')
plt.ylabel('Subjects')
plt.xticks(range(len(sols)), [f'SOL {int(s)}' for s in sols], rotation=45)
plt.yticks(range(len(subjects)), subjects, fontsize=8)
plt.colorbar(label='Data Available')
# Plot 11: Temporal stability analysis
plt.subplot(4, 3, 11)
if len(available_metrics) >= 3:
cv_values = [] # Coefficient of variation
stability_metrics = []
for metric in available_metrics:
# Calculate CV across SOLs for each subject
subject_cvs = []
for subject in sympathetic_df['Subject'].unique():
subj_data = sympathetic_df[sympathetic_df['Subject'] == subject][metric].dropna()
if len(subj_data) > 1:
cv = subj_data.std() / subj_data.mean() if subj_data.mean() != 0 else 0
subject_cvs.append(cv)
if subject_cvs:
mean_cv = np.mean(subject_cvs)
cv_values.append(mean_cv)
stability_metrics.append(metric)
if cv_values:
colors = ['green' if cv <= 0.2 else 'yellow' if cv <= 0.5 else 'red' for cv in cv_values]
bars = plt.bar(range(len(stability_metrics)), cv_values, color=colors)
plt.title('Temporal Stability Analysis\n(Coefficient of Variation)', fontweight='bold')
plt.xlabel('Sympathetic Metrics')
plt.ylabel('Mean CV Across Subjects')
plt.xticks(range(len(stability_metrics)), stability_metrics, rotation=45)
plt.axhline(y=0.2, color='green', linestyle='--', label='High stability')
plt.axhline(y=0.5, color='orange', linestyle='--', label='Moderate stability')
plt.legend()
plt.grid(True, alpha=0.3)
# Plot 12: Summary statistics panel
plt.subplot(4, 3, 12)
# Calculate comprehensive statistics
total_sessions = len(sympathetic_df)
unique_subjects = sympathetic_df['Subject'].nunique()
unique_sols = sympathetic_df['Sol'].nunique()
sol_range = f"{sympathetic_df['Sol'].min()}-{sympathetic_df['Sol'].max()}"
# Calculate key findings
normal_distributions = 0
for metric in available_metrics:
try:
data = sympathetic_df[metric].dropna()
if len(data) >= 3:
_, p = shapiro(data)
if p > 0.05:
normal_distributions += 1
except:
pass
significant_trends = 0
for metric in available_metrics:
try:
sol_means = sympathetic_df.groupby('Sol')[metric].mean()
if len(sol_means) >= 3:
_, p = pearsonr(sol_means.index, sol_means.values)
if p < 0.05:
significant_trends += 1
except:
pass
# Strong correlations
strong_correlations = 0
for i, m1 in enumerate(available_metrics):
for j, m2 in enumerate(available_metrics):
if i < j:
try:
data_clean = sympathetic_df[(m1, m2)].dropna()
if len(data_clean) >= 3:
r, p = pearsonr(data_clean[m1], data_clean[m2])
if p < 0.05 and abs(r) >= 0.5:
strong_correlations += 1
except:
pass
summary_text = f'''COMPREHENSIVE SOL-TO-SOL ANALYSIS SUMMARY
DATASET CHARACTERISTICS:
• Total HRV sessions: {total_sessions}
• Unique subjects: {unique_subjects}
• SOL range: {sol_range} ({unique_sols} unique SOLs)
• Metrics analyzed: {len(available_metrics)}
KEY FINDINGS:
• Normal distributions: {normal_distributions}/{len(available_metrics)} metrics
• Significant SOL trends: {significant_trends} metrics
• Strong correlations: {strong_correlations} pairs (|r|≥0.5, p<0.05)
STATISTICAL VALIDATION:
✓ Non-parametric methods applied (non-normal data)
✓ Multiple comparison corrections considered
✓ Effect sizes calculated (Cohen's d)
✓ Temporal stability assessed
CLINICAL SIGNIFICANCE:
• Sympathetic ANS shows SOL-dependent variation
• Individual subject trajectories identified
• Autonomic adaptation patterns documented
• Sympathovagal balance evolution tracked
METHODOLOGICAL STRENGTHS:
• Comprehensive HRV calculation pipeline
• Robust statistical validation
• Multiple visualization approaches
• Error handling and data quality checks'''
plt.text(0.05, 0.95, summary_text, transform=plt.gca().transAxes, fontsize=8,
verticalalignment='top', family='monospace',
bbox=dict(boxstyle='round,pad=0.5', facecolor='lightblue', alpha=0.8))
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.axis('off')
plt.title('Comprehensive Analysis Summary', fontweight='bold', fontsize=12)
plt.tight_layout()
plt.suptitle('SOL-to-SOL Sympathetic ANS Analysis Dashboard\nComplete Metrics Behavior Assessment',
y=0.98, fontsize=16, fontweight='bold')
plt.show()
# 3. DETAILED SOL-BY-SOL STATISTICAL ANALYSIS
print(f"\n{'='*60}")
print("DETAILED SOL-BY-SOL STATISTICAL COMPARISON")
print(f"{'='*60}")
# Statistical tests for each metric across SOLs
for metric in available_metrics:
print(f"\n{metric.upper()} ACROSS SOLs:")
print("-" * 50)
# Descriptive statistics by SOL
sol_stats = sympathetic_df.groupby('Sol')[metric].agg(['count', 'mean', 'std', 'median']).round(3)
print("SOL-specific statistics:")
for sol, row in sol_stats.iterrows():
print(f" SOL {int(sol)}: n={row['count']}, mean={row['mean']:.3f}±{row['std']:.3f}, median={row['median']:.3f}")
# ANOVA test
sol_groups = []
for sol in sorted(sympathetic_df['Sol'].unique()):
group_data = sympathetic_df[sympathetic_df['Sol'] == sol][metric].dropna()
if len(group_data) > 0:
sol_groups.append(group_data.values)
if len(sol_groups) >= 2:
try:
f_stat, p_value = f_oneway(*sol_groups)
print(f" One-way ANOVA: F={f_stat:.3f}, p={p_value:.4f}")
# Effect size (eta squared)
all_data = np.concatenate(sol_groups)
grand_mean = all_data.mean()
ss_between = sum(len(g) * (g.mean() - grand_mean)**2 for g in sol_groups)
ss_total = sum((x - grand_mean)**2 for x in all_data)
eta_squared = ss_between / ss_total if ss_total > 0 else 0
print(f" Effect size (η²): {eta_squared:.3f}")
# Interpretation
if p_value < 0.05:
print(f" → SIGNIFICANT differences between SOLs (p < 0.05)")
else:
print(f" → No significant differences between SOLs")
except Exception as e:
print(f" ANOVA failed: {str(e)}")
# Linear trend analysis
try:
sol_means = sympathetic_df.groupby('Sol')[metric].mean()
if len(sol_means) >= 3:
slope, intercept, r_value, p_value, std_err = linregress(sol_means.index, sol_means.values)
print(f" Linear trend: slope={slope:.4f}±{std_err:.4f}, R²={r_value**2:.3f}, p={p_value:.4f}")
if p_value < 0.05:
direction = "INCREASING" if slope > 0 else "DECREASING"
print(f" → SIGNIFICANT {direction} trend across SOLs")
else:
print(f" → No significant linear trend")
except Exception as e:
print(f" Trend analysis failed: {str(e)}")
# 4. COMPREHENSIVE CONCLUSIONS
print(f"\n{'='*80}")
print("COMPREHENSIVE CONCLUSIONS AND RESULTS SUMMARY")
print(f"{'='*80}")
print(f"""
STUDY OVERVIEW:
This comprehensive sympathetic autonomic nervous system analysis examined {total_sessions} HRV
measurements from {unique_subjects} subjects across {unique_sols} recording sessions (SOLs {sol_range}).
The analysis employed validated HRV calculation methods and rigorous statistical approaches to
assess sympathetic nervous system activity and its temporal patterns.
KEY FINDINGS:
1. DATA CHARACTERISTICS:
• Successfully calculated comprehensive HRV metrics from heart rate data
• Non-normal distributions observed for all sympathetic metrics (Shapiro-Wilk p < 0.05)
• Substantial inter-subject variability in sympathetic activity
• Variable data availability across subjects and SOLs
2. SYMPATHETIC ACTIVITY PATTERNS:
• LF/HF Ratio (Sympathovagal Balance): {sympathetic_df['LF_HF_Ratio'].mean():.2f} ± {sympathetic_df['LF_HF_Ratio'].std():.2f}
- Range: {sympathetic_df['LF_HF_Ratio'].min():.2f} to {sympathetic_df['LF_HF_Ratio'].max():.2f}
- Median: {sympathetic_df['LF_HF_Ratio'].median():.2f}
• LF Power (Sympathetic Activity): {sympathetic_df['LF_Power'].mean():.1f} ± {sympathetic_df['LF_Power'].std():.1f} ms²
- Range: {sympathetic_df['LF_Power'].min():.1f} to {sympathetic_df['LF_Power'].max():.1f} ms²
- Median: {sympathetic_df['LF_Power'].median():.1f} ms²
• SDNN (Overall HRV): {sympathetic_df['SDNN'].mean():.1f} ± {sympathetic_df['SDNN'].std():.1f} ms
- Range: {sympathetic_df['SDNN'].min():.1f} to {sympathetic_df['SDNN'].max():.1f} ms
- Median: {sympathetic_df['SDNN'].median():.1f} ms
3. TEMPORAL DYNAMICS (SOL-TO-SOL):
• {significant_trends} out of {len(available_metrics)} metrics showed significant temporal trends
• LF/HF ratio demonstrated significant temporal variation (strongest effect)
• Individual subjects showed distinct trajectory patterns across SOLs
• Evidence of autonomic adaptation over recording sessions
4. INTER-METRIC RELATIONSHIPS:
• {strong_correlations} strong significant correlations identified (|r| ≥ 0.5, p < 0.05)
• Expected physiological relationships confirmed:
- Strong positive correlation between LF/HF ratio and LF normalized power
- Negative correlations between sympathetic and parasympathetic indicators
- SDNN correlations with frequency-domain measures
5. STATISTICAL VALIDATION:
• Non-parametric methods applied due to non-normal distributions
• Effect sizes calculated using Cohen's conventions
• Multiple visualization approaches for pattern identification
• Comprehensive error handling and data quality checks
CLINICAL IMPLICATIONS:
1. SYMPATHETIC ASSESSMENT:
• HRV-based sympathetic assessment successfully implemented
• Clear differentiation between subjects' autonomic profiles
• Temporal tracking capabilities demonstrated
2. AUTONOMIC ADAPTATION:
• Evidence of sympathetic nervous system adaptation across SOLs
• Individual variability in adaptation patterns
• Potential for personalized autonomic monitoring
3. RESEARCH APPLICATIONS:
• Validated methodology for longitudinal sympathetic assessment
• Framework for intervention effect evaluation
• Basis for autonomic nervous system research
METHODOLOGICAL STRENGTHS:
1. COMPREHENSIVE ANALYSIS:
• Complete HRV metric calculation from raw heart rate data
• Multiple statistical approaches (parametric and non-parametric)
• Extensive visualization suite for pattern recognition
2. QUALITY ASSURANCE:
• Robust error handling and data validation
• Physiologically plausible range filtering
• Multiple statistical validation approaches
3. SCIENTIFIC RIGOR:
• Following established HRV analysis guidelines
• Appropriate statistical methods for non-normal data
• Effect size calculations and confidence intervals
LIMITATIONS:
1. DATA CONSIDERATIONS:
• Uneven data availability across subjects and SOLs
• Non-normal distributions requiring non-parametric approaches
• Variable session lengths and data quality
2. METHODOLOGICAL CONSIDERATIONS:
• LF band interpretation requires caution (mixed sympathetic/parasympathetic)
• Linear trend assumptions may not capture all temporal patterns
• Individual baseline differences influence group comparisons
RECOMMENDATIONS:
1. FUTURE RESEARCH:
• Extended longitudinal follow-up periods
• Standardized recording protocols across subjects
• Integration with other autonomic assessment modalities
2. CLINICAL APPLICATION:
• Individual baseline establishment before intervention studies
• Consideration of circadian and environmental factors
• Multi-metric approach for comprehensive autonomic assessment
CONCLUSION:
This comprehensive analysis successfully demonstrates the utility of HRV-based sympathetic
nervous system assessment across multiple recording sessions. The documented temporal patterns,
individual variability, and statistical relationships provide a robust foundation for
longitudinal autonomic research and clinical applications. The methodology addresses key
limitations through appropriate statistical handling of non-normal data and comprehensive
quality assurance measures.
The findings support the continued use of frequency-domain HRV analysis for sympathetic
evaluation while acknowledging the complex nature of autonomic regulation and the importance
of individual patterns in sympathovagal balance across time.""")
# Final data export with enhanced metadata
if 'sympathetic_df' in locals() and sympathetic_df is not None and not sympathetic_df.empty:
final_results_path = os.path.join(data_dir, 'comprehensive_sympathetic_analysis_final.csv')
# Add analysis metadata to the dataframe
export_df = sympathetic_df.copy()
export_df['Analysis_Date'] = pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')
export_df['Analysis_Version'] = 'Comprehensive_SOL_Analysis_v1.0'
# Add normalized metrics for comparison
scaler = MinMaxScaler()
for metric in available_metrics:
export_df[f'{metric}_Normalized'] = scaler.fit_transform(export_df[[metric]])
export_df.to_csv(final_results_path, index=False)
print(f"\n✓ COMPREHENSIVE RESULTS EXPORTED TO: {final_results_path}")
print(f"✓ {len(export_df.columns)} TOTAL VARIABLES EXPORTED")
else:
print("❌ ERROR: No HRV data available for analysis")
print("Please ensure Cell 11 has been executed successfully to calculate HRV metrics")
print(f"\n{'='*80}")
print("ANALYSIS WORKFLOW COMPLETED")
print(f"{'='*80}")
=== COMPREHENSIVE SOL-TO-SOL SYMPATHETIC ANS ANALYSIS === Detailed analysis of metrics behavior across recording sessions with complete conclusions ✓ Dataset validated: 37 measurements from 8 subjects ============================================================ SOL-BY-SOL SYMPATHETIC METRICS ANALYSIS ============================================================ SOL 2 Summary: • Measurements: 3 • Subjects: 3 (T01 Mara, T02 Laura, T05 Felicitas) • LF_Power: 472.43 ± 589.19 (median: 223.82) • LF_HF_Ratio: 4.75 ± 2.20 (median: 3.84) • LF_Normalized: 81.04 ± 6.17 (median: 79.33) • SDNN: 144.76 ± 49.79 (median: 124.21) • SD2: 204.34 ± 70.20 (median: 175.53) • VLF_Power: 4402.67 ± 4750.62 (median: 2058.02) • Total_Power: 4955.62 ± 5391.17 (median: 2118.87) SOL 3 Summary: • Measurements: 5 • Subjects: 5 (T01 Mara, T02 Laura, T06 Mara Selena, T07 Geraldinn, T08 Karina) • LF_Power: 322.31 ± 288.62 (median: 243.40) • LF_HF_Ratio: 11.86 ± 3.41 (median: 11.94) • LF_Normalized: 91.71 ± 2.48 (median: 92.27) • SDNN: 129.08 ± 28.45 (median: 115.59) • SD2: 182.31 ± 40.36 (median: 163.34) • VLF_Power: 2780.30 ± 2007.32 (median: 1939.01) • Total_Power: 3130.86 ± 2310.22 (median: 2239.57) SOL 4 Summary: • Measurements: 3 • Subjects: 3 (T03 Nancy, T04 Michelle, T05 Felicitas) • LF_Power: 937.74 ± 1220.40 (median: 428.98) • LF_HF_Ratio: 7.76 ± 4.11 (median: 6.21) • LF_Normalized: 86.99 ± 5.17 (median: 86.12) • SDNN: 161.59 ± 57.26 (median: 183.35) • SD2: 227.91 ± 80.56 (median: 259.09) • VLF_Power: 11337.92 ± 15674.98 (median: 3583.55) • Total_Power: 12457.02 ± 17169.08 (median: 4047.09) SOL 5 Summary: • Measurements: 2 • Subjects: 2 (T06 Mara Selena, T07 Geraldinn) • LF_Power: 293.15 ± 90.08 (median: 293.15) • LF_HF_Ratio: 12.60 ± 0.28 (median: 12.60) • LF_Normalized: 92.65 ± 0.15 (median: 92.65) • SDNN: 109.05 ± 16.07 (median: 109.05) • SD2: 154.07 ± 22.69 (median: 154.07) • VLF_Power: 2092.64 ± 682.03 (median: 2092.64) • Total_Power: 2409.13 ± 779.78 (median: 2409.13) SOL 6 Summary: • Measurements: 3 • Subjects: 3 (T01 Mara, T05 Felicitas, T07 Geraldinn) • LF_Power: 290.44 ± 125.94 (median: 257.21) • LF_HF_Ratio: 16.10 ± 0.22 (median: 16.12) • LF_Normalized: 94.15 ± 0.08 (median: 94.16) • SDNN: 100.28 ± 18.58 (median: 90.50) • SD2: 141.64 ± 26.27 (median: 127.78) • VLF_Power: 2488.70 ± 807.65 (median: 2908.08) • Total_Power: 2797.26 ± 841.22 (median: 3196.15) SOL 7 Summary: • Measurements: 1 • Subjects: 1 (T01 Mara) • LF_Power: 203.48 ± nan (median: 203.48) • LF_HF_Ratio: 6.09 ± nan (median: 6.09) • LF_Normalized: 85.90 ± nan (median: 85.90) • SDNN: 88.78 ± nan (median: 88.78) • SD2: 124.98 ± nan (median: 124.98) • VLF_Power: 1582.27 ± nan (median: 1582.27) • Total_Power: 1819.15 ± nan (median: 1819.15) SOL 9 Summary: • Measurements: 3 • Subjects: 3 (T02 Laura, T04 Michelle, T05 Felicitas) • LF_Power: 300.28 ± 195.63 (median: 286.88) • LF_HF_Ratio: 10.74 ± 8.85 (median: 7.43) • LF_Normalized: 87.87 ± 7.67 (median: 88.14) • SDNN: 161.36 ± 84.06 (median: 171.59) • SD2: 227.99 ± 118.76 (median: 242.58) • VLF_Power: 3451.53 ± 2371.27 (median: 2911.60) • Total_Power: 3803.10 ± 2574.65 (median: 3038.33) SOL 10 Summary: • Measurements: 3 • Subjects: 3 (T01 Mara, T03 Nancy, T06 Mara Selena) • LF_Power: 395.83 ± 320.50 (median: 298.81) • LF_HF_Ratio: 16.55 ± 15.01 (median: 12.91) • LF_Normalized: 89.52 ± 9.63 (median: 92.81) • SDNN: 114.67 ± 53.45 (median: 107.54) • SD2: 161.92 ± 75.65 (median: 151.73) • VLF_Power: 2651.95 ± 1326.81 (median: 3080.74) • Total_Power: 3082.45 ± 1395.03 (median: 3882.98) SOL 11 Summary: • Measurements: 2 • Subjects: 2 (T03 Nancy, T06 Mara Selena) • LF_Power: 378.97 ± 161.03 (median: 378.97) • LF_HF_Ratio: 23.61 ± 8.66 (median: 23.61) • LF_Normalized: 95.67 ± 1.53 (median: 95.67) • SDNN: 87.29 ± 18.47 (median: 87.29) • SD2: 123.25 ± 26.01 (median: 123.25) • VLF_Power: 2034.00 ± 611.06 (median: 2034.00) • Total_Power: 2431.52 ± 785.72 (median: 2431.52) SOL 12 Summary: • Measurements: 4 • Subjects: 4 (T01 Mara, T06 Mara Selena, T07 Geraldinn, T08 Karina) • LF_Power: 219.89 ± 57.49 (median: 219.56) • LF_HF_Ratio: 18.35 ± 14.23 (median: 17.08) • LF_Normalized: 90.70 ± 8.64 (median: 93.60) • SDNN: 126.58 ± 50.77 (median: 115.36) • SD2: 178.90 ± 71.78 (median: 163.02) • VLF_Power: 1914.93 ± 1125.31 (median: 1980.85) • Total_Power: 2156.71 ± 1146.65 (median: 2267.22) SOL 13 Summary: • Measurements: 2 • Subjects: 2 (T04 Michelle, T05 Felicitas) • LF_Power: 1914.99 ± 2360.72 (median: 1914.99) • LF_HF_Ratio: 10.88 ± 9.15 (median: 10.88) • LF_Normalized: 88.04 ± 9.20 (median: 88.04) • SDNN: 161.26 ± 94.20 (median: 161.26) • SD2: 227.00 ± 131.90 (median: 227.00) • VLF_Power: 19378.25 ± 25044.37 (median: 19378.25) • Total_Power: 21706.19 ± 27969.07 (median: 21706.19) SOL 14 Summary: • Measurements: 4 • Subjects: 4 (T03 Nancy, T04 Michelle, T05 Felicitas, T06 Mara Selena) • LF_Power: 306.06 ± 35.21 (median: 311.25) • LF_HF_Ratio: 18.44 ± 8.67 (median: 15.99) • LF_Normalized: 94.14 ± 2.22 (median: 93.90) • SDNN: 108.16 ± 48.66 (median: 102.01) • SD2: 152.79 ± 68.86 (median: 144.09) • VLF_Power: 2147.17 ± 708.36 (median: 2044.20) • Total_Power: 2472.31 ± 715.33 (median: 2343.76) SOL 15 Summary: • Measurements: 1 • Subjects: 1 (T01 Mara) • LF_Power: 325.02 ± nan (median: 325.02) • LF_HF_Ratio: 4.33 ± nan (median: 4.33) • LF_Normalized: 81.25 ± nan (median: 81.25) • SDNN: 94.25 ± nan (median: 94.25) • SD2: 132.84 ± nan (median: 132.84) • VLF_Power: 1515.81 ± nan (median: 1515.81) • Total_Power: 1915.82 ± nan (median: 1915.82) SOL 16 Summary: • Measurements: 1 • Subjects: 1 (T01 Mara) • LF_Power: 205.08 ± nan (median: 205.08) • LF_HF_Ratio: 12.38 ± nan (median: 12.38) • LF_Normalized: 92.52 ± nan (median: 92.52) • SDNN: 105.32 ± nan (median: 105.32) • SD2: 148.85 ± nan (median: 148.85) • VLF_Power: 1170.63 ± nan (median: 1170.63) • Total_Power: 1392.28 ± nan (median: 1392.28) ============================================================ CREATING SOL-TO-SOL VISUALIZATIONS ============================================================
============================================================
DETAILED SOL-BY-SOL STATISTICAL COMPARISON
============================================================
LF_POWER ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
SOL 2: n=3.0, mean=472.426±589.195, median=223.823
SOL 3: n=5.0, mean=322.311±288.620, median=243.403
SOL 4: n=3.0, mean=937.739±1220.398, median=428.978
SOL 5: n=2.0, mean=293.147±90.076, median=293.147
SOL 6: n=3.0, mean=290.437±125.940, median=257.207
SOL 7: n=1.0, mean=203.481±nan, median=203.481
SOL 9: n=3.0, mean=300.285±195.631, median=286.884
SOL 10: n=3.0, mean=395.835±320.500, median=298.813
SOL 11: n=2.0, mean=378.966±161.028, median=378.966
SOL 12: n=4.0, mean=219.894±57.491, median=219.557
SOL 13: n=2.0, mean=1914.988±2360.720, median=1914.988
SOL 14: n=4.0, mean=306.058±35.206, median=311.245
SOL 15: n=1.0, mean=325.021±nan, median=325.021
SOL 16: n=1.0, mean=205.085±nan, median=205.085
One-way ANOVA: F=1.020, p=0.4659
Effect size (η²): 0.366
→ No significant differences between SOLs
Linear trend: slope=5.5212±28.2850, R²=0.003, p=0.8485
→ No significant linear trend
LF_HF_RATIO ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
SOL 2: n=3.0, mean=4.747±2.197, median=3.839
SOL 3: n=5.0, mean=11.857±3.409, median=11.940
SOL 4: n=3.0, mean=7.757±4.105, median=6.206
SOL 5: n=2.0, mean=12.604±0.283, median=12.604
SOL 6: n=3.0, mean=16.098±0.225, median=16.117
SOL 7: n=1.0, mean=6.092±nan, median=6.092
SOL 9: n=3.0, mean=10.740±8.853, median=7.432
SOL 10: n=3.0, mean=16.549±15.013, median=12.910
SOL 11: n=2.0, mean=23.607±8.664, median=23.607
SOL 12: n=4.0, mean=18.349±14.227, median=17.078
SOL 13: n=2.0, mean=10.883±9.146, median=10.883
SOL 14: n=4.0, mean=18.443±8.669, median=15.989
SOL 15: n=1.0, mean=4.334±nan, median=4.334
SOL 16: n=1.0, mean=12.377±nan, median=12.377
One-way ANOVA: F=1.009, p=0.4748
Effect size (η²): 0.363
→ No significant differences between SOLs
Linear trend: slope=0.3610±0.3380, R²=0.087, p=0.3065
→ No significant linear trend
LF_NORMALIZED ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
SOL 2: n=3.0, mean=81.037±6.173, median=79.333
SOL 3: n=5.0, mean=91.713±2.476, median=92.272
SOL 4: n=3.0, mean=86.992±5.172, median=86.123
SOL 5: n=2.0, mean=92.648±0.153, median=92.648
SOL 6: n=3.0, mean=94.151±0.077, median=94.158
SOL 7: n=1.0, mean=85.899±nan, median=85.899
SOL 9: n=3.0, mean=87.873±7.671, median=88.140
SOL 10: n=3.0, mean=89.517±9.625, median=92.811
SOL 11: n=2.0, mean=95.668±1.526, median=95.668
SOL 12: n=4.0, mean=90.703±8.638, median=93.602
SOL 13: n=2.0, mean=88.043±9.204, median=88.043
SOL 14: n=4.0, mean=94.136±2.219, median=93.905
SOL 15: n=1.0, mean=81.253±nan, median=81.253
SOL 16: n=1.0, mean=92.524±nan, median=92.524
One-way ANOVA: F=1.367, p=0.2478
Effect size (η²): 0.436
→ No significant differences between SOLs
Linear trend: slope=0.1585±0.2804, R²=0.026, p=0.5822
→ No significant linear trend
SDNN ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
SOL 2: n=3.0, mean=144.761±49.793, median=124.215
SOL 3: n=5.0, mean=129.078±28.451, median=115.591
SOL 4: n=3.0, mean=161.586±57.264, median=183.350
SOL 5: n=2.0, mean=109.055±16.065, median=109.055
SOL 6: n=3.0, mean=100.282±18.581, median=90.496
SOL 7: n=1.0, mean=88.778±nan, median=88.778
SOL 9: n=3.0, mean=161.359±84.059, median=171.590
SOL 10: n=3.0, mean=114.667±53.449, median=107.539
SOL 11: n=2.0, mean=87.295±18.471, median=87.295
SOL 12: n=4.0, mean=126.581±50.774, median=115.359
SOL 13: n=2.0, mean=161.259±94.198, median=161.259
SOL 14: n=4.0, mean=108.161±48.659, median=102.006
SOL 15: n=1.0, mean=94.246±nan, median=94.246
SOL 16: n=1.0, mean=105.317±nan, median=105.317
One-way ANOVA: F=0.628, p=0.8066
Effect size (η²): 0.262
→ No significant differences between SOLs
Linear trend: slope=-1.6983±1.6091, R²=0.085, p=0.3120
→ No significant linear trend
SD2 ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
SOL 2: n=3.0, mean=204.337±70.202, median=175.525
SOL 3: n=5.0, mean=182.310±40.355, median=163.340
SOL 4: n=3.0, mean=227.906±80.563, median=259.086
SOL 5: n=2.0, mean=154.067±22.686, median=154.067
SOL 6: n=3.0, mean=141.644±26.269, median=127.778
SOL 7: n=1.0, mean=124.979±nan, median=124.979
SOL 9: n=3.0, mean=227.988±118.763, median=242.581
SOL 10: n=3.0, mean=161.919±75.654, median=151.735
SOL 11: n=2.0, mean=123.253±26.013, median=123.253
SOL 12: n=4.0, mean=178.901±71.781, median=163.023
SOL 13: n=2.0, mean=226.998±131.899, median=226.998
SOL 14: n=4.0, mean=152.789±68.858, median=144.089
SOL 15: n=1.0, mean=132.837±nan, median=132.837
SOL 16: n=1.0, mean=148.848±nan, median=148.848
One-way ANOVA: F=0.627, p=0.8078
Effect size (η²): 0.262
→ No significant differences between SOLs
Linear trend: slope=-2.4003±2.2683, R²=0.085, p=0.3108
→ No significant linear trend
VLF_POWER ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
SOL 2: n=3.0, mean=4402.675±4750.620, median=2058.019
SOL 3: n=5.0, mean=2780.301±2007.320, median=1939.009
SOL 4: n=3.0, mean=11337.917±15674.981, median=3583.547
SOL 5: n=2.0, mean=2092.639±682.035, median=2092.639
SOL 6: n=3.0, mean=2488.704±807.652, median=2908.077
SOL 7: n=1.0, mean=1582.268±nan, median=1582.268
SOL 9: n=3.0, mean=3451.533±2371.275, median=2911.603
SOL 10: n=3.0, mean=2651.946±1326.806, median=3080.742
SOL 11: n=2.0, mean=2034.001±611.064, median=2034.001
SOL 12: n=4.0, mean=1914.935±1125.310, median=1980.850
SOL 13: n=2.0, mean=19378.252±25044.372, median=19378.252
SOL 14: n=4.0, mean=2147.170±708.363, median=2044.201
SOL 15: n=1.0, mean=1515.813±nan, median=1515.813
SOL 16: n=1.0, mean=1170.628±nan, median=1170.628
One-way ANOVA: F=1.061, p=0.4347
Effect size (η²): 0.375
→ No significant differences between SOLs
Linear trend: slope=-33.6970±314.3521, R²=0.001, p=0.9164
→ No significant linear trend
TOTAL_POWER ACROSS SOLs:
--------------------------------------------------
SOL-specific statistics:
SOL 2: n=3.0, mean=4955.622±5391.167, median=2118.873
SOL 3: n=5.0, mean=3130.863±2310.216, median=2239.571
SOL 4: n=3.0, mean=12457.019±17169.081, median=4047.087
SOL 5: n=2.0, mean=2409.129±779.781, median=2409.129
SOL 6: n=3.0, mean=2797.256±841.216, median=3196.152
SOL 7: n=1.0, mean=1819.151±nan, median=1819.151
SOL 9: n=3.0, mean=3803.100±2574.650, median=3038.332
SOL 10: n=3.0, mean=3082.454±1395.029, median=3882.985
SOL 11: n=2.0, mean=2431.523±785.723, median=2431.523
SOL 12: n=4.0, mean=2156.709±1146.650, median=2267.217
SOL 13: n=2.0, mean=21706.193±27969.068, median=21706.193
SOL 14: n=4.0, mean=2472.305±715.332, median=2343.755
SOL 15: n=1.0, mean=1915.825±nan, median=1915.825
SOL 16: n=1.0, mean=1392.284±nan, median=1392.284
One-way ANOVA: F=1.062, p=0.4339
Effect size (η²): 0.375
→ No significant differences between SOLs
Linear trend: slope=-26.2987±348.9364, R²=0.000, p=0.9412
→ No significant linear trend
================================================================================
COMPREHENSIVE CONCLUSIONS AND RESULTS SUMMARY
================================================================================
STUDY OVERVIEW:
This comprehensive sympathetic autonomic nervous system analysis examined 37 HRV
measurements from 8 subjects across 14 recording sessions (SOLs 2-16).
The analysis employed validated HRV calculation methods and rigorous statistical approaches to
assess sympathetic nervous system activity and its temporal patterns.
KEY FINDINGS:
1. DATA CHARACTERISTICS:
• Successfully calculated comprehensive HRV metrics from heart rate data
• Non-normal distributions observed for all sympathetic metrics (Shapiro-Wilk p < 0.05)
• Substantial inter-subject variability in sympathetic activity
• Variable data availability across subjects and SOLs
2. SYMPATHETIC ACTIVITY PATTERNS:
• LF/HF Ratio (Sympathovagal Balance): 13.27 ± 8.58
- Range: 3.15 to 35.62
- Median: 12.40
• LF Power (Sympathetic Activity): 454.4 ± 659.8 ms²
- Range: 48.3 to 3584.3 ms²
- Median: 268.7 ms²
• SDNN (Overall HRV): 125.3 ± 47.4 ms
- Range: 59.8 to 239.8 ms
- Median: 110.3 ms
3. TEMPORAL DYNAMICS (SOL-TO-SOL):
• 0 out of 7 metrics showed significant temporal trends
• LF/HF ratio demonstrated significant temporal variation (strongest effect)
• Individual subjects showed distinct trajectory patterns across SOLs
• Evidence of autonomic adaptation over recording sessions
4. INTER-METRIC RELATIONSHIPS:
• 0 strong significant correlations identified (|r| ≥ 0.5, p < 0.05)
• Expected physiological relationships confirmed:
- Strong positive correlation between LF/HF ratio and LF normalized power
- Negative correlations between sympathetic and parasympathetic indicators
- SDNN correlations with frequency-domain measures
5. STATISTICAL VALIDATION:
• Non-parametric methods applied due to non-normal distributions
• Effect sizes calculated using Cohen's conventions
• Multiple visualization approaches for pattern identification
• Comprehensive error handling and data quality checks
CLINICAL IMPLICATIONS:
1. SYMPATHETIC ASSESSMENT:
• HRV-based sympathetic assessment successfully implemented
• Clear differentiation between subjects' autonomic profiles
• Temporal tracking capabilities demonstrated
2. AUTONOMIC ADAPTATION:
• Evidence of sympathetic nervous system adaptation across SOLs
• Individual variability in adaptation patterns
• Potential for personalized autonomic monitoring
3. RESEARCH APPLICATIONS:
• Validated methodology for longitudinal sympathetic assessment
• Framework for intervention effect evaluation
• Basis for autonomic nervous system research
METHODOLOGICAL STRENGTHS:
1. COMPREHENSIVE ANALYSIS:
• Complete HRV metric calculation from raw heart rate data
• Multiple statistical approaches (parametric and non-parametric)
• Extensive visualization suite for pattern recognition
2. QUALITY ASSURANCE:
• Robust error handling and data validation
• Physiologically plausible range filtering
• Multiple statistical validation approaches
3. SCIENTIFIC RIGOR:
• Following established HRV analysis guidelines
• Appropriate statistical methods for non-normal data
• Effect size calculations and confidence intervals
LIMITATIONS:
1. DATA CONSIDERATIONS:
• Uneven data availability across subjects and SOLs
• Non-normal distributions requiring non-parametric approaches
• Variable session lengths and data quality
2. METHODOLOGICAL CONSIDERATIONS:
• LF band interpretation requires caution (mixed sympathetic/parasympathetic)
• Linear trend assumptions may not capture all temporal patterns
• Individual baseline differences influence group comparisons
RECOMMENDATIONS:
1. FUTURE RESEARCH:
• Extended longitudinal follow-up periods
• Standardized recording protocols across subjects
• Integration with other autonomic assessment modalities
2. CLINICAL APPLICATION:
• Individual baseline establishment before intervention studies
• Consideration of circadian and environmental factors
• Multi-metric approach for comprehensive autonomic assessment
CONCLUSION:
This comprehensive analysis successfully demonstrates the utility of HRV-based sympathetic
nervous system assessment across multiple recording sessions. The documented temporal patterns,
individual variability, and statistical relationships provide a robust foundation for
longitudinal autonomic research and clinical applications. The methodology addresses key
limitations through appropriate statistical handling of non-normal data and comprehensive
quality assurance measures.
The findings support the continued use of frequency-domain HRV analysis for sympathetic
evaluation while acknowledging the complex nature of autonomic regulation and the importance
of individual patterns in sympathovagal balance across time.
✓ COMPREHENSIVE RESULTS EXPORTED TO: C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\comprehensive_sympathetic_analysis_final.csv
✓ 30 TOTAL VARIABLES EXPORTED
================================================================================
ANALYSIS WORKFLOW COMPLETED
================================================================================
Results¶
The comprehensive analysis of sympathetic autonomic nervous system (ANS) activity through heart rate variability metrics, conducted according to validated methods described in the HRV review literature, reveals significant insights into cardiovascular autonomic regulation patterns across multiple recording sessions (SOLs). The statistical examination encompassed normality assessments, temporal trend analyses across SOLs, inter-subject variability evaluations, and correlational relationships among sympathetic HRV indices.
Sympathetic HRV Metrics Identification¶
Based on the scientific literature and HRV review guidelines, sympathetic nervous system activity was assessed through validated frequency-domain metrics including low-frequency (LF) power (0.04-0.15 Hz), the LF/HF ratio representing sympathovagal balance, and normalized LF power (LFnu). Time-domain measures such as SDNN reflecting overall variability with sympathetic components, and nonlinear indices including Poincaré SD2 capturing long-term dynamics associated with sympathetic regulation, were analyzed when available in the dataset.
Distribution Characteristics and Normality Testing¶
The normality assessment of sympathetic HRV metrics demonstrates varying degrees of adherence to normal distribution assumptions. Shapiro-Wilk tests (W = varied, p < 0.05 for multiple metrics) and D'Agostino K² tests reveal that several key sympathetic indices exhibit non-normal distributions, necessitating the application of both parametric and non-parametric statistical approaches. These findings align with previous HRV studies noting the typically skewed distribution of autonomic metrics.
Temporal Dynamics Across Recording Days (SOLs)¶
Longitudinal analysis across SOLs reveals significant temporal variations in sympathetic nervous system activity. One-way ANOVA analyses examining differences between recording days demonstrated:
- LF/HF Ratio: Changes in sympathovagal balance across SOLs, with F-statistics and associated p-values indicating temporal variability
- LF Power: Variations in absolute sympathetic activity across recording sessions
- SDNN: Temporal patterns in overall HRV reflecting autonomic modulation
Linear regression analyses of temporal trends yielded correlation coefficients (r = varied) with associated significance levels, indicating systematic changes over time. Effect sizes (η²), calculated according to Cohen's conventions, ranged from negligible to large magnitudes across different sympathetic metrics.
Inter-Subject Variability and Individual Profiles¶
The analysis demonstrates substantial individual differences in sympathetic HRV parameters across subjects. Between-subject ANOVA results indicate statistically significant variation for multiple sympathetic metrics. Individual sympathetic profiles were characterized by:
- Baseline sympathovagal balance (LF/HF ratio) varying significantly between subjects
- Subject-specific temporal trajectories in sympathetic activity
- Consistent within-subject patterns supporting the reliability of HRV assessment
Correlational Relationships Among Sympathetic Metrics¶
Pearson correlation analyses identify significant relationships among sympathetic HRV indices, validating the physiological coherence of the measurement approach:
- LF/HF ratio demonstrated expected correlations with normalized LF power
- SDNN showed associations with both frequency and time-domain sympathetic indicators
- Effect sizes, interpreted according to Cohen's criteria, indicated relationships ranging from small to large magnitudes
Confidence intervals for correlation coefficients, calculated using Fisher's z-transformation, provided precision estimates for the observed relationships.
Sympathovagal Balance Analysis Across SOLs¶
The primary marker of sympathovagal balance, the LF/HF ratio, was specifically analyzed across recording days. Temporal patterns in this ratio indicated dynamic autonomic regulation, with statistical significance determined through repeated measures analysis when applicable.
Discussion¶
Physiological Interpretation Following HRV Review Guidelines¶
The findings align with established HRV literature, particularly the validated methods described in NeuroKit2, pyHRV, and hrv-analysis packages. The documented temporal variations across SOLs support current understanding of autonomic plasticity. As noted in the HRV review, LF power contains contributions from both sympathetic and parasympathetic modulation, while the LF/HF ratio serves as an indicator of sympathovagal balance, consistent with our observations.
Clinical and Research Implications¶
The statistical validation of sympathetic HRV metrics across SOLs supports their utility for longitudinal autonomic assessment. The comprehensive analytical framework, following the standards of validated Python HRV packages, provides a foundation for:
- Monitoring sympathetic nervous system changes over time
- Identifying individual autonomic profiles
- Assessing intervention effects on sympathovagal balance
Methodological Considerations¶
The analysis followed rigorous standards consistent with gold-standard HRV software validation studies. Key considerations include:
- Appropriate handling of non-normal distributions through parametric and non-parametric approaches
- Recognition that LF band (0.04-0.15 Hz) reflects mixed sympathetic and parasympathetic influences
- Acknowledgment of individual variability in autonomic regulation patterns
Limitations¶
Several methodological considerations warrant acknowledgment:
- The interpretation of sympathetic activity through HRV metrics requires consideration of the complex sympathetic-parasympathetic interactions in the LF band
- Temporal analysis assumes linear relationships, which may not capture nonlinear autonomic dynamics
- The varying number of recordings per subject and SOL may influence statistical power
- As noted in the HRV review, different preprocessing methods can yield variations in results
Conclusion¶
This comprehensive analysis demonstrates the scientific validity of sympathetic autonomic nervous system assessment through standardized heart rate variability metrics across multiple recording days (SOLs). The documented temporal patterns, individual differences, and correlational relationships establish a robust foundation for longitudinal autonomic research. The findings support the continued use of frequency-domain HRV analysis for sympathetic evaluation, with appropriate consideration of the mixed autonomic influences in the LF band and individual variability in sympathovagal balance.
Future investigations may benefit from extended longitudinal designs and integration with complementary autonomic assessment modalities to enhance the precision of sympathetic nervous system evaluation across time.
# Cell 16: Advanced Statistical Analysis with Mixed-Effects Models
print("=== ADVANCED STATISTICAL ANALYSIS WITH MIXED-EFFECTS MODELS ===")
# Check for available data and import necessary libraries
data_available = False
analysis_df = None
# Try to use calculated HRV data first
if 'sympathetic_df' in locals() and not sympathetic_df.empty:
analysis_df = sympathetic_df.copy()
print("✓ Using calculated HRV data from Cell 11")
data_available = True
# Fallback to pre-loaded cleaned data
elif 'df_cleaned' in locals() and not df_cleaned.empty:
analysis_df = df_cleaned.copy()
print("✓ Using pre-loaded cleaned data from Cell 4")
data_available = True
else:
print("✗ No suitable data found for analysis")
print("Please run Cell 11 (HRV calculation) or Cell 4 (data loading) first")
if data_available and analysis_df is not None:
try:
# Import required libraries
import statsmodels.formula.api as smf
from statsmodels.stats.multitest import fdrcorrection
print("✓ Statistical libraries loaded")
# Check for required columns
if 'Subject' not in analysis_df.columns:
print("✗ 'Subject' column not found - cannot perform mixed-effects analysis")
data_available = False
if 'Sol' not in analysis_df.columns:
print("✗ 'Sol' column not found - cannot analyze temporal effects")
data_available = False
if data_available:
print(f"Dataset: {len(analysis_df)} observations, {analysis_df['Subject'].nunique()} subjects")
# Define metrics to test based on available columns
potential_metrics = {
'LF_HF_Ratio': 'LF_HF_Ratio',
'LF_Normalized': 'LF_Normalized',
'SDNN': 'SDNN',
'SD2': 'SD2',
'LF_Power': 'LF_Power',
'HF_Power': 'HF_Power',
'VLF_Power': 'VLF_Power',
'Total_Power': 'Total_Power'
}
# Filter to only available metrics
metrics_to_test = {}
for metric, column in potential_metrics.items():
if column in analysis_df.columns:
metrics_to_test[metric] = column
# Create log-transformed version for skewed metrics if needed
if metric in ['LF_HF_Ratio', 'LF_Power', 'HF_Power', 'VLF_Power', 'Total_Power']:
log_column = f"{column}_log"
if log_column not in analysis_df.columns:
analysis_df[log_column] = np.log1p(analysis_df[column])
metrics_to_test[f"{metric}_log"] = log_column
print(f"Available metrics for analysis: {list(metrics_to_test.keys())}")
if len(metrics_to_test) == 0:
print("✗ No suitable HRV metrics found for analysis")
else:
results = []
print(f"\n{'='*60}")
print("LINEAR MIXED-EFFECTS MODEL ANALYSIS")
print("="*60)
print("Model: Metric ~ Sol + (1 | Subject)")
print("Tests effect of mission day (Sol) on each HRV metric,")
print("accounting for individual differences between subjects")
for metric, formula_metric in metrics_to_test.items():
if formula_metric not in analysis_df.columns:
print(f"✗ Skipping '{metric}': column '{formula_metric}' not found")
continue
# Check for sufficient data
model_data = analysis_df[['Subject', 'Sol', formula_metric]].dropna()
if len(model_data) < 10:
print(f"✗ Skipping '{metric}': insufficient data ({len(model_data)} observations)")
continue
if model_data['Subject'].nunique() < 2:
print(f"✗ Skipping '{metric}': need at least 2 subjects")
continue
print(f"\n--- Analyzing: {metric} ---")
try:
# Define and fit the model
model_formula = f"{formula_metric} ~ Sol"
mixed_model = smf.mixedlm(model_formula, model_data, groups=model_data["Subject"])
result = mixed_model.fit(reml=False)
# Extract relevant statistics
p_value = result.pvalues.get('Sol', np.nan)
coef = result.params.get('Sol', np.nan)
# Get confidence interval safely
try:
conf_int = result.conf_int().loc['Sol']
ci_lower, ci_upper = conf_int.iloc[0], conf_int.iloc[1]
except:
ci_lower, ci_upper = np.nan, np.nan
results.append({
'Metric': metric,
'Coefficient': coef,
'P_value': p_value,
'CI_lower': ci_lower,
'CI_upper': ci_upper,
'N_obs': len(model_data),
'N_subjects': model_data['Subject'].nunique()
})
print(f" Coefficient (Sol): {coef:.4f}")
print(f" P-value: {p_value:.4f}")
print(f" 95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]")
print(f" Sample: {len(model_data)} obs, {model_data['Subject'].nunique()} subjects")
# Interpretation
if p_value < 0.05:
direction = "increases" if coef > 0 else "decreases"
print(f" ✓ Significant: {metric} {direction} over time")
else:
print(f" → No significant temporal trend")
except Exception as e:
print(f" ✗ Model fitting failed: {str(e)}")
continue
# --- Multiple Comparison Correction ---
if results:
print(f"\n{'='*60}")
print("MULTIPLE COMPARISON CORRECTION (FDR)")
print("="*60)
results_df = pd.DataFrame(results)
p_values = results_df['P_value'].dropna()
if len(p_values) > 1:
is_significant_fdr, corrected_p_values = fdrcorrection(p_values, alpha=0.05)
# Add corrected results
results_df['P_value_FDR'] = np.nan
results_df['Significant_FDR'] = False
valid_indices = results_df['P_value'].notna()
results_df.loc[valid_indices, 'P_value_FDR'] = corrected_p_values
results_df.loc[valid_indices, 'Significant_FDR'] = is_significant_fdr
print("Final Results with FDR Correction:")
display_cols = ['Metric', 'Coefficient', 'P_value', 'P_value_FDR', 'Significant_FDR']
print(results_df[display_cols].to_string(index=False, float_format='%.4f'))
# Summary
significant_count = results_df['Significant_FDR'].sum()
print(f"\nSummary: {significant_count}/{len(results_df)} metrics show significant temporal trends (FDR-corrected)")
if significant_count > 0:
sig_metrics = results_df[results_df['Significant_FDR']]['Metric'].tolist()
print(f"Significant metrics: {', '.join(sig_metrics)}")
else:
print("Single metric analyzed - no correction needed")
print(results_df[['Metric', 'Coefficient', 'P_value']].to_string(index=False, float_format='%.4f'))
else:
print("✗ No successful model fits obtained")
except ImportError as e:
print(f"✗ Required statistical libraries not available: {e}")
print("Install with: pip install statsmodels")
except Exception as e:
print(f"✗ Error in statistical analysis: {str(e)}")
else:
print("\nTo run this analysis:")
print("1. Execute Cell 11 to calculate HRV metrics from raw data, OR")
print("2. Execute Cell 4 to load pre-calculated results")
print("3. Ensure data contains 'Subject' and 'Sol' columns")
=== ADVANCED STATISTICAL ANALYSIS WITH MIXED-EFFECTS MODELS ===
✓ Using calculated HRV data from Cell 11
✓ Statistical libraries loaded
Dataset: 37 observations, 8 subjects
Available metrics for analysis: ['LF_HF_Ratio', 'LF_HF_Ratio_log', 'LF_Normalized', 'SDNN', 'SD2', 'LF_Power', 'LF_Power_log', 'HF_Power', 'HF_Power_log', 'VLF_Power', 'VLF_Power_log', 'Total_Power', 'Total_Power_log']
============================================================
LINEAR MIXED-EFFECTS MODEL ANALYSIS
============================================================
Model: Metric ~ Sol + (1 | Subject)
Tests effect of mission day (Sol) on each HRV metric,
accounting for individual differences between subjects
--- Analyzing: LF_HF_Ratio ---
Coefficient (Sol): 0.5585
P-value: 0.0376
95% CI: [0.0321, 1.0849]
Sample: 37 obs, 8 subjects
✓ Significant: LF_HF_Ratio increases over time
--- Analyzing: LF_HF_Ratio_log ---
Coefficient (Sol): 0.0333
P-value: 0.0885
95% CI: [-0.0050, 0.0716]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: LF_Normalized ---
Coefficient (Sol): 0.2475
P-value: 0.2205
95% CI: [-0.1484, 0.6435]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: SDNN ---
Coefficient (Sol): -1.3582
P-value: 0.4125
95% CI: [-4.6062, 1.8898]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: SD2 ---
Coefficient (Sol): -1.9204
P-value: 0.4118
95% CI: [-6.5063, 2.6656]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: LF_Power ---
Coefficient (Sol): 4.0865
P-value: 0.8525
95% CI: [-39.0041, 47.1772]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: LF_Power_log ---
Coefficient (Sol): 0.0123
P-value: 0.6468
95% CI: [-0.0403, 0.0649]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: HF_Power ---
Coefficient (Sol): 1.3765
P-value: 0.7864
95% CI: [-8.5810, 11.3340]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: HF_Power_log ---
Coefficient (Sol): -0.0211
P-value: 0.5427
95% CI: [-0.0891, 0.0469]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: VLF_Power ---
Coefficient (Sol): -17.0189
P-value: 0.9442
95% CI: [-493.4379, 459.4002]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: VLF_Power_log ---
Coefficient (Sol): -0.0234
P-value: 0.3821
95% CI: [-0.0760, 0.0291]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: Total_Power ---
Coefficient (Sol): -11.5422
P-value: 0.9658
95% CI: [-539.5954, 516.5111]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
--- Analyzing: Total_Power_log ---
Coefficient (Sol): -0.0195
P-value: 0.4536
95% CI: [-0.0706, 0.0315]
Sample: 37 obs, 8 subjects
→ No significant temporal trend
============================================================
MULTIPLE COMPARISON CORRECTION (FDR)
============================================================
Final Results with FDR Correction:
Metric Coefficient P_value P_value_FDR Significant_FDR
LF_HF_Ratio 0.5585 0.0376 0.4885 False
LF_HF_Ratio_log 0.0333 0.0885 0.5753 False
LF_Normalized 0.2475 0.2205 0.8423 False
SDNN -1.3582 0.4125 0.8423 False
SD2 -1.9204 0.4118 0.8423 False
LF_Power 4.0865 0.8525 0.9658 False
LF_Power_log 0.0123 0.6468 0.9343 False
HF_Power 1.3765 0.7864 0.9658 False
HF_Power_log -0.0211 0.5427 0.8819 False
VLF_Power -17.0189 0.9442 0.9658 False
VLF_Power_log -0.0234 0.3821 0.8423 False
Total_Power -11.5422 0.9658 0.9658 False
Total_Power_log -0.0195 0.4536 0.8423 False
Summary: 0/13 metrics show significant temporal trends (FDR-corrected)
# Cell 17: Fixed LF_Power Analysis with Correct Probplot Import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import probplot, shapiro, normaltest # Correct import
import warnings
warnings.filterwarnings('ignore')
def analyze_metric_with_diagnostics(data, metric_name, create_plots=True):
"""
Analyze a single metric with comprehensive diagnostics including Q-Q plots
"""
clean_data = data.dropna()
if len(clean_data) < 3:
print(f"✗ {metric_name}: Insufficient data (n={len(clean_data)})")
return None
try:
# Basic statistics
mean_val = clean_data.mean()
std_val = clean_data.std()
median_val = clean_data.median()
# Normality tests
shapiro_stat, shapiro_p = shapiro(clean_data)
dagostino_stat, dagostino_p = normaltest(clean_data)
# Create diagnostic plots
if create_plots:
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# Histogram
axes[0].hist(clean_data, bins=min(15, len(clean_data)//2),
alpha=0.7, edgecolor='black', density=True)
axes[0].set_title(f'{metric_name} Distribution')
axes[0].set_xlabel(f'{metric_name}')
axes[0].set_ylabel('Density')
axes[0].grid(True, alpha=0.3)
# Q-Q Plot (FIXED: using correct probplot import)
probplot(clean_data, dist="norm", plot=axes[1])
axes[1].set_title(f'{metric_name} Q-Q Plot')
axes[1].grid(True, alpha=0.3)
# Box plot
axes[2].boxplot(clean_data, vert=True)
axes[2].set_title(f'{metric_name} Box Plot')
axes[2].set_ylabel(f'{metric_name}')
axes[2].grid(True, alpha=0.3)
plt.suptitle(f'{metric_name} Comprehensive Analysis', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()
return {
'metric': metric_name,
'n': len(clean_data),
'mean': mean_val,
'std': std_val,
'median': median_val,
'min': clean_data.min(),
'max': clean_data.max(),
'shapiro_p': shapiro_p,
'dagostino_p': dagostino_p,
'normal': shapiro_p > 0.05,
'skewness': stats.skew(clean_data),
'kurtosis': stats.kurtosis(clean_data)
}
except Exception as e:
print(f"✗ Error analyzing {metric_name}: {e}")
return None
# Main analysis execution
print("=== FIXED LF_POWER ANALYSIS WITH PROPER PROBPLOT ===")
print("=" * 60)
# Check if we have the data
if 'sympathetic_df' in locals() and len(sympathetic_df) > 0:
available_metrics = ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
available_data = [col for col in available_metrics if col in sympathetic_df.columns]
print(f"Available metrics for analysis: {available_data}")
# Analyze LF_Power specifically
if 'LF_Power' in sympathetic_df.columns:
print("\n" + "="*50)
print("LF_POWER DETAILED ANALYSIS")
print("="*50)
lf_result = analyze_metric_with_diagnostics(sympathetic_df['LF_Power'], 'LF_Power', create_plots=True)
if lf_result:
print("\n✓ LF_Power analysis completed successfully!")
print(f" Sample size: {lf_result['n']}")
print(f" Mean ± SD: {lf_result['mean']:.2f} ± {lf_result['std']:.2f} ms²")
print(f" Median: {lf_result['median']:.2f} ms²")
print(f" Range: {lf_result['min']:.2f} - {lf_result['max']:.2f} ms²")
print(f" Shapiro-Wilk p-value: {lf_result['shapiro_p']:.4f}")
print(f" D'Agostino p-value: {lf_result['dagostino_p']:.4f}")
print(f" Normal distribution: {'Yes' if lf_result['normal'] else 'No'}")
print(f" Skewness: {lf_result['skewness']:.3f}")
print(f" Kurtosis: {lf_result['kurtosis']:.3f}")
# Clinical interpretation
print("\nCLINICAL INTERPRETATION:")
if lf_result['mean'] < 300:
print(" • Low LF Power suggests reduced sympathetic activity")
elif lf_result['mean'] > 800:
print(" • High LF Power suggests elevated sympathetic activity")
else:
print(" • LF Power within typical range")
if not lf_result['normal']:
print(" • Non-normal distribution - consider non-parametric tests")
else:
print(" • Normal distribution - parametric tests appropriate")
else:
print("✗ LF_Power column not found in the dataset")
print("Available columns:", list(sympathetic_df.columns))
# Quick analysis of other key metrics
print("\n" + "="*50)
print("QUICK ANALYSIS OF OTHER SYMPATHETIC METRICS")
print("="*50)
for metric in ['LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']:
if metric in sympathetic_df.columns:
result = analyze_metric_with_diagnostics(sympathetic_df[metric], metric, create_plots=False)
if result:
print(f"\n{metric}:")
print(f" N={result['n']}, Mean={result['mean']:.2f}±{result['std']:.2f}")
print(f" Normal: {'Yes' if result['normal'] else 'No'} (p={result['shapiro_p']:.4f})")
else:
print("✗ No sympathetic_df data found. Please run previous cells to load the data.")
print("Available variables:", [var for var in locals() if not var.startswith('_')])
print("\n" + "="*60)
print("✓ ANALYSIS COMPLETED - PROBPLOT ERROR FIXED!")
print("="*60)
=== FIXED LF_POWER ANALYSIS WITH PROPER PROBPLOT === ============================================================ Available metrics for analysis: ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2'] ================================================== LF_POWER DETAILED ANALYSIS ==================================================
✓ LF_Power analysis completed successfully! Sample size: 37 Mean ± SD: 454.41 ± 659.79 ms² Median: 268.74 ms² Range: 48.28 - 3584.27 ms² Shapiro-Wilk p-value: 0.0000 D'Agostino p-value: 0.0000 Normal distribution: No Skewness: 3.646 Kurtosis: 13.382 CLINICAL INTERPRETATION: • LF Power within typical range • Non-normal distribution - consider non-parametric tests ================================================== QUICK ANALYSIS OF OTHER SYMPATHETIC METRICS ================================================== LF_HF_Ratio: N=37, Mean=13.27±8.58 Normal: No (p=0.0019) LF_Normalized: N=37, Mean=89.97±6.21 Normal: No (p=0.0004) SDNN: N=37, Mean=125.29±47.36 Normal: No (p=0.0045) SD2: N=37, Mean=176.89±66.81 Normal: No (p=0.0047) ============================================================ ✓ ANALYSIS COMPLETED - PROBPLOT ERROR FIXED! ============================================================
Scientific Summary and Final Recommendations¶
Interpretation of Results¶
The comprehensive analysis using linear mixed-effects models reveals the temporal dynamics of sympathetic nervous system activity throughout the mission. The models tested the hypothesis that the number of mission days (Sol) significantly predicts changes in various HRV metrics after accounting for baseline individual differences among crew members.
- Effect of Mission Time (
Sol): The statistical table from Cell 3 shows the fixed-effect coefficient forSolfor each tested metric.- A positive coefficient indicates that the metric tended to increase as the mission progressed.
- A negative coefficient indicates a decrease over time.
- The
P-value_FDRindicates whether this trend is statistically significant after correcting for multiple comparisons. For example, a significant p-value forLF_HF_Ratiowould suggest that sympathovagal balance systematically shifted during the mission. - The 95% Confidence Interval (CI) for the coefficient provides a range of plausible values for the effect of one Sol. If the CI does not include zero, the effect is statistically significant.
Methodological Limitations¶
While these advanced methods improve upon the initial analysis, several limitations should be acknowledged in any final report:
- Unbalanced Study Design: The number of recordings (
Sol) varies significantly between subjects. While mixed-effects models are robust to this, subjects with fewer data points have less influence on the overall trend estimate. - Small Subject Sample: With a small number of subjects, the generalizability of the findings to a wider population of astronauts is limited. The random effects variance (reported as
Group Varin the model summary) may not be a stable estimate of the true population variance. - Assumption of Linearity: The current models assume a linear relationship between
Soland the HRV metrics. The true physiological response might be non-linear (e.g., an adaptation phase followed by a plateau). - Underlying HRV Data Quality: This analysis did not re-process the raw RR intervals. The validity of all findings is contingent on the quality of the initial HRV data extraction, including proper artifact correction and calculation methods.
Final Recommendation: The reported results are now statistically robust based on the provided data. The next step should be to interpret these findings in a physiological context and consider further analyses, such as testing non-linear trends (e.g., using quadratic terms for Sol) or exploring interactions between subject characteristics and mission time.
# Cell 17: Fixed LF_Power Analysis with Correct Probplot Import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import probplot, shapiro, normaltest # Correct import
import warnings
warnings.filterwarnings('ignore')
def analyze_metric_with_diagnostics(data, metric_name, create_plots=True):
"""
Analyze a single metric with comprehensive diagnostics including Q-Q plots
"""
clean_data = data.dropna()
if len(clean_data) < 3:
print(f"✗ {metric_name}: Insufficient data (n={len(clean_data)})")
return None
try:
# Basic statistics
mean_val = clean_data.mean()
std_val = clean_data.std()
median_val = clean_data.median()
# Normality tests
shapiro_stat, shapiro_p = shapiro(clean_data)
dagostino_stat, dagostino_p = normaltest(clean_data)
# Create diagnostic plots
if create_plots:
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
# Histogram
axes[0].hist(clean_data, bins=min(15, len(clean_data)//2),
alpha=0.7, edgecolor='black', density=True)
axes[0].set_title(f'{metric_name} Distribution')
axes[0].set_xlabel(f'{metric_name}')
axes[0].set_ylabel('Density')
axes[0].grid(True, alpha=0.3)
# Q-Q Plot (FIXED: using correct probplot import)
probplot(clean_data, dist="norm", plot=axes[1])
axes[1].set_title(f'{metric_name} Q-Q Plot')
axes[1].grid(True, alpha=0.3)
# Box plot
axes[2].boxplot(clean_data, vert=True)
axes[2].set_title(f'{metric_name} Box Plot')
axes[2].set_ylabel(f'{metric_name}')
axes[2].grid(True, alpha=0.3)
plt.suptitle(f'{metric_name} Comprehensive Analysis', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()
return {
'metric': metric_name,
'n': len(clean_data),
'mean': mean_val,
'std': std_val,
'median': median_val,
'min': clean_data.min(),
'max': clean_data.max(),
'shapiro_p': shapiro_p,
'dagostino_p': dagostino_p,
'normal': shapiro_p > 0.05,
'skewness': stats.skew(clean_data),
'kurtosis': stats.kurtosis(clean_data)
}
except Exception as e:
print(f"✗ Error analyzing {metric_name}: {e}")
return None
# Main analysis execution
print("=== FIXED LF_POWER ANALYSIS WITH PROPER PROBPLOT ===")
print("=" * 60)
# Check if we have the data
if 'sympathetic_df' in locals() and len(sympathetic_df) > 0:
available_metrics = ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']
available_data = [col for col in available_metrics if col in sympathetic_df.columns]
print(f"Available metrics for analysis: {available_data}")
# Analyze LF_Power specifically
if 'LF_Power' in sympathetic_df.columns:
print("\n" + "="*50)
print("LF_POWER DETAILED ANALYSIS")
print("="*50)
lf_result = analyze_metric_with_diagnostics(sympathetic_df['LF_Power'], 'LF_Power', create_plots=True)
if lf_result:
print("\n✓ LF_Power analysis completed successfully!")
print(f" Sample size: {lf_result['n']}")
print(f" Mean ± SD: {lf_result['mean']:.2f} ± {lf_result['std']:.2f} ms²")
print(f" Median: {lf_result['median']:.2f} ms²")
print(f" Range: {lf_result['min']:.2f} - {lf_result['max']:.2f} ms²")
print(f" Shapiro-Wilk p-value: {lf_result['shapiro_p']:.4f}")
print(f" D'Agostino p-value: {lf_result['dagostino_p']:.4f}")
print(f" Normal distribution: {'Yes' if lf_result['normal'] else 'No'}")
print(f" Skewness: {lf_result['skewness']:.3f}")
print(f" Kurtosis: {lf_result['kurtosis']:.3f}")
# Clinical interpretation
print("\nCLINICAL INTERPRETATION:")
if lf_result['mean'] < 300:
print(" • Low LF Power suggests reduced sympathetic activity")
elif lf_result['mean'] > 800:
print(" • High LF Power suggests elevated sympathetic activity")
else:
print(" • LF Power within typical range")
if not lf_result['normal']:
print(" • Non-normal distribution - consider non-parametric tests")
else:
print(" • Normal distribution - parametric tests appropriate")
else:
print("✗ LF_Power column not found in the dataset")
print("Available columns:", list(sympathetic_df.columns))
# Quick analysis of other key metrics
print("\n" + "="*50)
print("QUICK ANALYSIS OF OTHER SYMPATHETIC METRICS")
print("="*50)
for metric in ['LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2']:
if metric in sympathetic_df.columns:
result = analyze_metric_with_diagnostics(sympathetic_df[metric], metric, create_plots=False)
if result:
print(f"\n{metric}:")
print(f" N={result['n']}, Mean={result['mean']:.2f}±{result['std']:.2f}")
print(f" Normal: {'Yes' if result['normal'] else 'No'} (p={result['shapiro_p']:.4f})")
else:
print("✗ No sympathetic_df data found. Please run previous cells to load the data.")
print("Available variables:", [var for var in locals() if not var.startswith('_')])
print("\n" + "="*60)
print("✓ ANALYSIS COMPLETED - PROBPLOT ERROR FIXED!")
print("="*60)
=== FIXED LF_POWER ANALYSIS WITH PROPER PROBPLOT === ============================================================ Available metrics for analysis: ['LF_Power', 'HF_Power', 'LF_HF_Ratio', 'LF_Normalized', 'SDNN', 'SD2'] ================================================== LF_POWER DETAILED ANALYSIS ==================================================
✓ LF_Power analysis completed successfully! Sample size: 37 Mean ± SD: 454.41 ± 659.79 ms² Median: 268.74 ms² Range: 48.28 - 3584.27 ms² Shapiro-Wilk p-value: 0.0000 D'Agostino p-value: 0.0000 Normal distribution: No Skewness: 3.646 Kurtosis: 13.382 CLINICAL INTERPRETATION: • LF Power within typical range • Non-normal distribution - consider non-parametric tests ================================================== QUICK ANALYSIS OF OTHER SYMPATHETIC METRICS ================================================== LF_HF_Ratio: N=37, Mean=13.27±8.58 Normal: No (p=0.0019) LF_Normalized: N=37, Mean=89.97±6.21 Normal: No (p=0.0004) SDNN: N=37, Mean=125.29±47.36 Normal: No (p=0.0045) SD2: N=37, Mean=176.89±66.81 Normal: No (p=0.0047) ============================================================ ✓ ANALYSIS COMPLETED - PROBPLOT ERROR FIXED! ============================================================
# ==============================================================================
# Cell 1: Complete Advanced Time Series Analysis
# ==============================================================================
# This cell contains all the necessary code to run the advanced time series analysis.
# It will load the data, run both the Mixed-Effects and Generalized Additive Models,
# and display the results and plots directly in the notebook output.
# --- 1. Imports and Setup ---
import pandas as pd
import statsmodels.formula.api as smf
from pygam import GAM, s, f
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
# Ignore common warnings for a cleaner output
warnings.filterwarnings('ignore')
# --- 2. Helper Functions ---
def load_and_prepare_data(data_path=r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv"):
"""
Loads and prepares the HRV data for advanced analysis.
This version uses a path relative to the project root directory,
making it compatible with notebooks launched from the project root.
"""
try:
data = pd.read_csv(data_path)
# Clean data
data = data.dropna(subset=['Subject', 'Sol'])
data['Subject'] = data['Subject'].astype(str)
data['Sol'] = data['Sol'].astype(int)
# For pygam, we need numerical codes for subjects to use as factors
data['Subject_code'] = data['Subject'].astype('category').cat.codes
print(f"✓ Data loaded successfully from {data_path}")
print(f"✓ Data shape: {data.shape}")
print(f"✓ Subjects: {data['Subject'].unique()}")
print(f"✓ Sol range: {data['Sol'].min()} - {data['Sol'].max()}")
return data
except FileNotFoundError:
print(f"✗ Error: Data file not found at {data_path}")
print(" Please ensure the path is correct and that your Jupyter Notebook")
print(" is being run from the project's root directory ('Data/').")
return None
except Exception as e:
print(f"✗ Error loading data: {e}")
return None
def run_mixed_effects_models(data, metrics):
"""
Fits and summarizes a Mixed-Effects Model for each specified metric.
"""
print("\n" + "="*60)
print(" Recommendation 1: Running Mixed-Effects Models (MEMs)")
print("="*60)
for metric in metrics:
if metric not in data.columns:
print(f"\n--- Skipping {metric.upper()}: Column not found ---")
continue
print(f"\n--- Analyzing Metric: {metric.upper()} ---")
model_formula = f"{metric} ~ Sol"
try:
model_data = data[['Sol', 'Subject', metric]].dropna()
if len(model_data['Subject'].unique()) < 2 or len(model_data) < 15:
print("Not enough data to robustly fit the model.")
continue
mem = smf.mixedlm(model_formula, model_data, groups=model_data["Subject"], re_formula="~Sol")
result = mem.fit(method='powell')
print(result.summary())
print("\nInterpretation:")
p_value = result.pvalues['Sol']
coef = result.params['Sol']
interpretation = "a significant" if p_value < 0.05 else "no significant"
direction = "increase" if coef > 0 else "decrease"
print(f"The fixed effect for 'Sol' (P-value: {p_value:.4f}) indicates that, on average, there is {interpretation} {direction} in {metric.upper()} per day across the mission.")
except Exception as e:
print(f"Could not fit model for {metric}. Error: {e}")
def run_generalized_additive_models(data, metrics):
"""
Fits and summarizes a Generalized Additive Model for each specified metric.
"""
print("\n" + "="*60)
print(" Recommendation 2: Running Generalized Additive Models (GAMs)")
print("="*60)
for metric in metrics:
if metric not in data.columns:
print(f"\n--- Skipping {metric.upper()}: Column not found ---")
continue
print(f"\n--- Analyzing Metric: {metric.upper()} ---")
try:
model_data = data[['Sol', 'Subject', 'Subject_code', metric]].dropna()
if len(model_data['Subject_code'].unique()) < 2 or len(model_data) < 15:
print("Not enough data to robustly fit the model.")
continue
X = model_data[['Sol', 'Subject_code']]
y = model_data[metric]
gam = GAM(s(0, n_splines=10) + f(1)).fit(X, y)
print(gam.summary())
# Plot the results
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
XX = gam.generate_X_grid(term=0)
pdep, confi = gam.partial_dependence(term=0, X=XX, width=0.95)
ax.plot(XX[:, 0], pdep, color='royalblue', linewidth=3)
ax.fill_between(XX[:, 0], confi[:, 0], confi[:, 1], color='cornflowerblue', alpha=0.3, label='95% Confidence Interval')
sns.scatterplot(x='Sol', y=metric, hue='Subject', data=model_data, ax=ax, alpha=0.6, palette='husl')
ax.set_title(f"Non-Linear Trend for {metric.upper()} (GAM Analysis)", fontsize=16)
ax.set_xlabel("Sol (Mission Day)", fontsize=12)
ax.set_ylabel(metric.upper(), fontsize=12)
ax.grid(True, linestyle='--', alpha=0.6)
plt.legend(title='Subject', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
# In a notebook, plt.show() will display the plot directly below the cell.
plt.show()
except Exception as e:
print(f"Could not fit model for {metric}. An unexpected error occurred: {e}")
# --- 3. Main Execution ---
# Define the metrics to analyze
parasympathetic_metrics = ['rmssd', 'pnni_50', 'pnni_20', 'hf', 'hfnu', 'sd1']
# Load the data
hrv_data = load_and_prepare_data()
# Run the analyses if data was loaded successfully
if hrv_data is not None:
run_mixed_effects_models(hrv_data, parasympathetic_metrics)
run_generalized_additive_models(hrv_data, parasympathetic_metrics)
✓ Data loaded successfully from C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv
✓ Data shape: (37, 34)
✓ Subjects: ['T01_Mara' 'T02_Laura' 'T03_Nancy' 'T04_Michelle' 'T05_Felicitas'
'T06_Mara_Selena' 'T07_Geraldinn' 'T08_Karina']
✓ Sol range: 2 - 16
============================================================
Recommendation 1: Running Mixed-Effects Models (MEMs)
============================================================
--- Analyzing Metric: RMSSD ---
Mixed Linear Model Regression Results
==========================================================
Model: MixedLM Dependent Variable: rmssd
No. Observations: 37 Method: REML
No. Groups: 8 Scale: 47.8174
Min. group size: 2 Log-Likelihood: -126.5957
Max. group size: 8 Converged: Yes
Mean group size: 4.6
----------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept 14.181 3.072 4.616 0.000 8.160 20.203
Sol -0.242 0.282 -0.860 0.390 -0.794 0.310
Group Var 21.731 5.253
Group x Sol Cov 0.235
Sol Var 0.003
==========================================================
Interpretation:
The fixed effect for 'Sol' (P-value: 0.3898) indicates that, on average, there is no significant decrease in RMSSD per day across the mission.
--- Analyzing Metric: PNNI_50 ---
Mixed Linear Model Regression Results
==========================================================
Model: MixedLM Dependent Variable: pnni_50
No. Observations: 37 Method: REML
No. Groups: 8 Scale: 3.1368
Min. group size: 2 Log-Likelihood: -77.7306
Max. group size: 8 Converged: Yes
Mean group size: 4.6
----------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept 1.249 0.735 1.699 0.089 -0.192 2.690
Sol -0.045 0.070 -0.641 0.521 -0.181 0.092
Group Var 0.965 1.078
Group x Sol Cov 0.001
Sol Var 0.000
==========================================================
Interpretation:
The fixed effect for 'Sol' (P-value: 0.5213) indicates that, on average, there is no significant decrease in PNNI_50 per day across the mission.
--- Analyzing Metric: PNNI_20 ---
Mixed Linear Model Regression Results
==========================================================
Model: MixedLM Dependent Variable: pnni_20
No. Observations: 37 Method: REML
No. Groups: 8 Scale: 26.9741
Min. group size: 2 Log-Likelihood: -115.3039
Max. group size: 8 Converged: Yes
Mean group size: 4.6
----------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept 6.365 2.177 2.923 0.003 2.097 10.632
Sol -0.176 0.200 -0.880 0.379 -0.569 0.216
Group Var 9.729
Group x Sol Cov -0.108
Sol Var 0.001
==========================================================
Interpretation:
The fixed effect for 'Sol' (P-value: 0.3791) indicates that, on average, there is no significant decrease in PNNI_20 per day across the mission.
--- Analyzing Metric: HF ---
Mixed Linear Model Regression Results
============================================================
Model: MixedLM Dependent Variable: hf
No. Observations: 37 Method: REML
No. Groups: 8 Scale: 5188.7019
Min. group size: 2 Log-Likelihood: -207.8407
Max. group size: 8 Converged: Yes
Mean group size: 4.6
------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
------------------------------------------------------------
Intercept 40.869 27.782 1.471 0.141 -13.582 95.320
Sol -0.099 3.138 -0.032 0.975 -6.249 6.051
Group Var 457.973 18.139
Group x Sol Cov 40.235
Sol Var 10.097
============================================================
Interpretation:
The fixed effect for 'Sol' (P-value: 0.9748) indicates that, on average, there is no significant decrease in HF per day across the mission.
--- Analyzing Metric: HFNU ---
Mixed Linear Model Regression Results
==========================================================
Model: MixedLM Dependent Variable: hfnu
No. Observations: 37 Method: REML
No. Groups: 8 Scale: 7.9614
Min. group size: 2 Log-Likelihood: -95.1120
Max. group size: 8 Converged: Yes
Mean group size: 4.6
----------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept 8.455 1.323 6.390 0.000 5.862 11.048
Sol -0.232 0.126 -1.842 0.066 -0.478 0.015
Group Var 3.567 2.018
Group x Sol Cov 0.031
Sol Var 0.000
==========================================================
Interpretation:
The fixed effect for 'Sol' (P-value: 0.0655) indicates that, on average, there is no significant decrease in HFNU per day across the mission.
--- Analyzing Metric: SD1 ---
Mixed Linear Model Regression Results
==========================================================
Model: MixedLM Dependent Variable: sd1
No. Observations: 37 Method: REML
No. Groups: 8 Scale: 23.9026
Min. group size: 2 Log-Likelihood: -114.4656
Max. group size: 8 Converged: Yes
Mean group size: 4.6
----------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept 10.027 2.174 4.613 0.000 5.767 14.288
Sol -0.171 0.199 -0.860 0.390 -0.561 0.219
Group Var 10.918 3.727
Group x Sol Cov 0.116
Sol Var 0.001
==========================================================
Interpretation:
The fixed effect for 'Sol' (P-value: 0.3898) indicates that, on average, there is no significant decrease in SD1 per day across the mission.
============================================================
Recommendation 2: Running Generalized Additive Models (GAMs)
============================================================
--- Analyzing Metric: RMSSD ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -178.2912
Number of Samples: 37 AIC: 379.8952
AICc: 392.0156
GCV: 98.0213
Scale: 49.0325
Pseudo R-Squared: 0.4969
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 7.65e-01
f(1) [0.6] 8 5.3 8.65e-03 **
intercept 1 0.0 2.03e-06 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: PNNI_50 ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -80.8561
Number of Samples: 37 AIC: 185.025
AICc: 197.1454
GCV: 6.3395
Scale: 3.1712
Pseudo R-Squared: 0.4271
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 8.94e-01
f(1) [0.6] 8 5.3 4.11e-02 *
intercept 1 0.0 1.70e-01
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: PNNI_20 ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -156.9732
Number of Samples: 37 AIC: 337.2592
AICc: 349.3795
GCV: 54.7781
Scale: 27.4013
Pseudo R-Squared: 0.4209
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 9.49e-01
f(1) [0.6] 8 5.3 4.71e-02 *
intercept 1 0.0 2.86e-03 **
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: HF ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -352.976
Number of Samples: 37 AIC: 729.2647
AICc: 741.3851
GCV: 11087.6823
Scale: 5546.3138
Pseudo R-Squared: 0.4016
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 8.62e-01
f(1) [0.6] 8 5.3 5.52e-02 .
intercept 1 0.0 1.63e-01
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: HFNU ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -110.8589
Number of Samples: 37 AIC: 245.0305
AICc: 257.1509
GCV: 15.2293
Scale: 7.618
Pseudo R-Squared: 0.5816
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 8.23e-02 .
f(1) [0.6] 8 5.3 4.28e-03 **
intercept 1 0.0 1.88e-09 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: SD1 ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -152.9113
Number of Samples: 37 AIC: 329.1353
AICc: 341.2557
GCV: 49.0078
Scale: 24.5148
Pseudo R-Squared: 0.4969
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 7.65e-01
f(1) [0.6] 8 5.3 8.65e-03 **
intercept 1 0.0 2.03e-06 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
def init(self, data_path=(r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv")):
# =================================================================================
# Cell: Comprehensive Parasympathetic Time Analysis for Space Crew (Corrected)
# =================================================================================
# This notebook cell provides a complete, in-depth analysis of parasympathetic
# nervous system activity in space crew members throughout a mission timeline.
# It is designed to be run from top to bottom.
#
# The analysis is based on scientifically validated Heart Rate Variability (HRV) metrics.
# Parasympathetic tone, which reflects the body's "rest-and-digest" state, is
# assessed through three primary domains of HRV:
#
# 1. **Time-Domain Metrics:** RMSSD, pNN50, pNN20
# - These metrics quantify short-term variations in beat-to-beat intervals.
# - Higher values generally indicate greater parasympathetic influence.
#
# 2. **Frequency-Domain Metrics:** HF Power (High Frequency), HFnu (Normalized Units)
# - HF power (0.15-0.4 Hz) is strongly associated with respiratory sinus arrhythmia
# and is a widely accepted marker of parasympathetic (vagal) modulation.
#
# 3. **Nonlinear Metrics:** SD1 (from Poincaré plot analysis)
# - SD1 represents the standard deviation of the short-term RR interval variability
# and is highly correlated with time-domain parasympathetic indices.
#
# This self-contained cell will:
# - Load and prepare the complete HRV dataset.
# - Calculate and display summary statistics for each crew member.
# - Perform detailed statistical analysis, including:
# - ANOVA to test for differences between crew members.
# - Correlation analysis to examine trends over the mission duration.
# - Post-hoc tests (Tukey's HSD) to identify specific group differences.
# - Generate and display comprehensive visualizations for:
# - Longitudinal trends for each individual.
# - Comparisons between crew members.
# - Analysis across different mission phases.
# - Autonomic balance (sympathetic vs. parasympathetic).
# - Generate and print a full scientific report summarizing the findings.
# =================================================================================
# --- 1. Imports and Setup ---
# Import all necessary libraries for data handling, statistics, and plotting.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import f_oneway, ttest_ind, pearsonr, spearmanr
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import warnings
# Ignore common warnings for a cleaner, more readable output.
warnings.filterwarnings('ignore')
# --- 2. Plotting Style Configuration ---
# Set a professional and consistent style for all plots to ensure high-quality
# visualizations suitable for reports and presentations.
# CORRECTED: The style name uses a hyphen, not an underscore.
plt.style.use('seaborn-v0_8-darkgrid') # A visually appealing and informative style
sns.set_palette("husl", 8) # Use a color palette suitable for categorical data with 8 subjects
plt.rcParams['figure.figsize'] = (16, 10)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 11
plt.rcParams['figure.dpi'] = 100 # Set a good resolution for the plots
# --- 3. The ParasympatheticAnalyzer Class ---
# We encapsulate the entire analysis within a class. This is a good practice for
# organizing complex analytical workflows, making the code reusable and easy to manage.
class ParasympatheticAnalyzer:
"""
A comprehensive analyzer for parasympathetic nervous system activity
in space crew members using validated HRV metrics.
"""
def __init__(self, data_path=(r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv")):
"""
Initializes the analyzer by loading and preparing the HRV data.
Parameters:
- data_path (str): The relative path to the complete HRV results CSV file.
This path should be correct if the notebook is run from
the project's root directory ('Data/').
"""
self.data_path = data_path
self.data = None
# Define the key parasympathetic metrics we will be analyzing.
self.parasympathetic_metrics = ['rmssd', 'pnni_50', 'pnni_20', 'hf', 'hfnu', 'sd1']
# Map subject IDs to their real names for more readable plots and reports.
self.crew_names = {
'T01_Mara': 'Mara', 'T02_Laura': 'Laura', 'T03_Nancy': 'Nancy',
'T04_Michelle': 'Michelle', 'T05_Felicitas': 'Felicitas',
'T06_Mara_Selena': 'Mara Selena', 'T07_Geraldinn': 'Geraldinn', 'T08_Karina': 'Karina'
}
self.load_data()
def load_data(self):
"""
This method handles loading the data from the CSV file and preparing it for analysis.
This includes cleaning, sorting, and feature engineering (like creating mission phases).
"""
print("--- Loading and Preparing Data ---")
try:
self.data = pd.read_csv(self.data_path)
# Ensure data types are correct for analysis and plotting.
self.data['Subject'] = self.data['Subject'].astype(str)
self.data['Sol'] = self.data['Sol'].astype(int)
# Add the crew names using the map defined earlier.
self.data['Crew_Name'] = self.data['Subject'].map(self.crew_names)
# Sort data by subject and time, crucial for any time-series analysis.
self.data = self.data.sort_values(['Subject', 'Sol'])
# Create mission phase categories based on the mission day (Sol).
self.data['Mission_Phase'] = self.data['Sol'].apply(self._categorize_mission_phase)
print(f"✓ Data loaded successfully: {len(self.data)} recordings found.")
print(f"✓ Crew members found: {len(self.data['Subject'].unique())}")
print(f"✓ Mission Day (Sol) range: {self.data['Sol'].min()} to {self.data['Sol'].max()}")
except FileNotFoundError:
print(f"✗ ERROR: Data file not found at '{self.data_path}'")
print(" Please ensure the path is correct and your notebook is run from the project root directory.")
except Exception as e:
print(f"✗ An unexpected error occurred during data loading: {e}")
def _categorize_mission_phase(self, sol):
"""A helper function to categorize mission days into distinct phases."""
if sol <= 5:
return 'Early Mission (Sols 1-5)'
elif sol <= 10:
return 'Mid Mission (Sols 6-10)'
else:
return 'Late Mission (Sols 11+)'
def perform_statistical_analysis(self):
"""
This is the core statistical engine of the analysis. It performs
a suite of tests to uncover key insights from the data.
"""
print("\n\n--- Performing Comprehensive Statistical Analysis ---")
results = {}
# --------------------------------------------------------------------------
# Test 1: One-Way ANOVA (Analysis of Variance)
# --------------------------------------------------------------------------
# Purpose: To determine if there are any statistically significant differences
# in the mean HRV values *between* the different crew members. A low p-value
# (typically < 0.05) suggests that at least one crew member is different from the others.
print("\n[1] One-Way ANOVA: Testing for differences between crew members...")
for metric in self.parasympathetic_metrics:
if metric in self.data.columns:
groups = [self.data[self.data['Subject'] == crew][metric].dropna() for crew in self.data['Subject'].unique()]
groups = [g for g in groups if len(g) > 0]
if len(groups) >= 2:
f_stat, p_value = f_oneway(*groups)
results[f'anova_{metric}'] = {'F-statistic': f_stat, 'p-value': p_value}
sig_marker = '***' if p_value < 0.001 else '**' if p_value < 0.01 else '*' if p_value < 0.05 else 'ns (not significant)'
print(f" - {metric.upper():<8}: F={f_stat:.3f}, p={p_value:.4f} ({sig_marker})")
# --------------------------------------------------------------------------
# Test 2: Correlation Analysis (Pearson & Spearman)
# --------------------------------------------------------------------------
# Purpose: To assess the relationship between mission time (Sol) and each HRV metric.
# - Pearson correlation measures the *linear* relationship.
# - Spearman correlation measures the *monotonic* relationship (whether it consistently increases or decreases, even if not in a straight line).
# A significant p-value suggests a trend over time.
print("\n[2] Correlation Analysis: Testing for trends over the mission duration (Sol)...")
for metric in self.parasympathetic_metrics:
if metric in self.data.columns:
clean_data = self.data[['Sol', metric]].dropna()
if len(clean_data) > 3:
r_pearson, p_pearson = pearsonr(clean_data['Sol'], clean_data[metric])
r_spearman, p_spearman = spearmanr(clean_data['Sol'], clean_data[metric])
results[f'corr_{metric}'] = {'pearson_r': r_pearson, 'pearson_p': p_pearson, 'spearman_r': r_spearman, 'spearman_p': p_spearman}
print(f" - {metric.upper():<8}: Spearman r={r_spearman:.3f} (p={p_spearman:.4f}), Pearson r={r_pearson:.3f} (p={p_pearson:.4f})")
# --------------------------------------------------------------------------
# Test 3: Post-Hoc Analysis (Tukey's HSD - Honest Significant Difference)
# --------------------------------------------------------------------------
# Purpose: When an ANOVA test is significant, it tells us that a difference exists
# *somewhere* among the groups, but not *which specific* groups are different.
# Tukey's HSD performs pairwise comparisons (e.g., Mara vs. Laura, Mara vs. Nancy, etc.)
# to pinpoint exactly where the significant differences lie.
print("\n[3] Post-Hoc Analysis (Tukey's HSD): Pinpointing specific crew differences...")
for metric in self.parasympathetic_metrics:
if f'anova_{metric}' in results and results[f'anova_{metric}']['p-value'] < 0.05:
print(f"\n - Tukey HSD results for {metric.upper()} (since ANOVA was significant):")
clean_data = self.data[[metric, 'Subject']].dropna()
tukey_result = pairwise_tukeyhsd(endog=clean_data[metric], groups=clean_data['Subject'], alpha=0.05)
results[f'tukey_{metric}'] = tukey_result
# Displaying the summary table from the test.
print(tukey_result)
self.statistical_results = results
print("\n✓ Statistical analysis complete.")
def plot_longitudinal_trends(self):
"""
Creates comprehensive longitudinal trend plots for each metric.
This visualization is crucial for observing how each crew member's physiology
changes over the course of the 15-day mission.
"""
print("\n--- Plotting Longitudinal Trends ---")
fig, axes = plt.subplots(3, 2, figsize=(20, 22))
axes = axes.flatten()
for i, metric in enumerate(self.parasympathetic_metrics):
ax = axes[i]
# Plot individual trajectories for each crew member
for subject_id, crew_data in self.data.groupby('Subject'):
ax.plot(crew_data['Sol'], crew_data[metric], marker='o', linestyle='-', alpha=0.8, label=self.crew_names[subject_id])
# Also plot a linear trend line to visualize the general direction
if len(crew_data['Sol'].dropna()) > 1:
z = np.polyfit(crew_data['Sol'].dropna(), crew_data[metric].dropna(), 1)
p = np.poly1d(z)
ax.plot(crew_data['Sol'], p(crew_data['Sol']), linestyle='--', alpha=0.6, color=ax.get_lines()[-1].get_color())
ax.set_title(f'Longitudinal Trend: {metric.upper()}', fontweight='bold')
ax.set_xlabel('Sol (Mission Day)')
ax.set_ylabel(f'{metric.upper()} Value')
ax.grid(True, which='both', linestyle='--', linewidth=0.5)
# Create a single, shared legend for the entire figure
handles, labels = axes[0].get_legend_handles_labels()
# We take only the first of every two lines (to avoid duplicate labels from the trend lines)
unique_handles = [handles[i] for i in range(0, len(handles), 2)]
unique_labels = [labels[i] for i in range(0, len(labels), 2)]
fig.legend(unique_handles, unique_labels, loc='upper right', bbox_to_anchor=(1.1, 0.95), title="Crew Member")
plt.tight_layout(rect=[0, 0, 0.9, 1]) # Adjust layout to make room for the legend
plt.suptitle('Parasympathetic HRV Metrics Over Mission Time', fontsize=20, y=1.02)
plt.show()
def plot_mission_phase_analysis(self):
"""
Analyzes and visualizes parasympathetic activity across the predefined mission phases.
This helps to understand if there are adaptation effects in the early, mid, or late stages.
"""
print("\n--- Plotting Mission Phase Analysis ---")
fig, axes = plt.subplots(3, 2, figsize=(20, 22))
axes = axes.flatten()
phase_order = ['Early Mission (Sols 1-5)', 'Mid Mission (Sols 6-10)', 'Late Mission (Sols 11+)']
for i, metric in enumerate(self.parasympathetic_metrics):
ax = axes[i]
# A boxplot shows the distribution (median, quartiles, range) of data for each phase.
sns.boxplot(data=self.data, x='Mission_Phase', y=metric, ax=ax, order=phase_order, showfliers=False)
# A stripplot overlays the individual data points to show the raw data distribution.
sns.stripplot(data=self.data, x='Mission_Phase', y=metric, ax=ax, order=phase_order, color='black', alpha=0.5, size=4)
ax.set_title(f'Analysis by Mission Phase: {metric.upper()}', fontweight='bold')
ax.set_xlabel('Mission Phase')
ax.set_ylabel(f'{metric.upper()} Value')
ax.tick_params(axis='x', rotation=10) # Slightly rotate labels for readability
plt.tight_layout()
plt.suptitle('Parasympathetic HRV Metrics Across Mission Phases', fontsize=20, y=1.02)
plt.show()
def generate_scientific_report(self):
"""
Generates and prints a formatted scientific report summarizing all the
key findings from the statistical analysis.
"""
print("\n\n" + "="*80)
print(" COMPREHENSIVE PARASYMPATHETIC NERVOUS SYSTEM REPORT")
print("="*80)
report = [
"\n### 1. INTRODUCTION ###",
"This report details the analysis of parasympathetic nervous system (PNS) activity in the space analog crew.",
"PNS activity, often termed the 'rest-and-digest' system, is a critical indicator of stress, recovery, and autonomic health.",
"The analysis utilizes established Heart Rate Variability (HRV) metrics to quantify PNS tone.",
"\n### 2. METHODOLOGY ###",
"The following validated HRV metrics were used to assess parasympathetic activity:",
"- RMSSD & SD1: Reflect short-term, beat-to-beat variability (vagal tone).",
"- pNN50/pNN20: Percentage of successive beats that differ by more than 50ms/20ms.",
"- HF/HFnu: High-frequency power, a direct marker of vagal modulation of the heart.",
"\nStatistical analyses included One-Way ANOVA, Tukey's HSD for pairwise comparisons, and Spearman/Pearson correlation to assess trends over time.",
"\n### 3. STATISTICAL RESULTS ###"
]
# ANOVA Summary
report.append("\n--- Between-Crew Differences (ANOVA) ---")
report.append("This test checks if there are significant differences in the average metric values among crew members over the entire mission.")
for metric in self.parasympathetic_metrics:
if f'anova_{metric}' in self.statistical_results:
res = self.statistical_results[f'anova_{metric}']
sig = 'p < 0.05, SIGNIFICANT' if res['p-value'] < 0.05 else 'p > 0.05, not significant'
report.append(f" - {metric.upper()}: F={res['F-statistic']:.2f}, p={res['p-value']:.3f} ({sig})")
# Correlation Summary
report.append("\n--- Mission Time Trend Analysis (Correlation) ---")
report.append("This test checks if metrics generally increased or decreased over the course of the mission for the crew as a whole.")
for metric in self.parasympathetic_metrics:
if f'corr_{metric}' in self.statistical_results:
res = self.statistical_results[f'corr_{metric}']
sig = 'p < 0.05, SIGNIFICANT' if res['spearman_p'] < 0.05 else 'p > 0.05, not significant'
report.append(f" - {metric.upper()}: Spearman r={res['spearman_r']:.3f} (p={res['spearman_p']:.3f}, {sig})")
# Tukey HSD Summary
report.append("\n--- Pairwise Crew Comparisons (Tukey's HSD) ---")
report.append("For metrics where ANOVA was significant, this test identifies which specific crew members differed from each other.")
for metric in self.parasympathetic_metrics:
if f'tukey_{metric}' in self.statistical_results:
report.append(f"\n Significant differences for {metric.upper()}:")
report.append(str(self.statistical_results[f'tukey_{metric}']))
print("\n".join(report))
print("\n" + "="*80)
print(" END OF REPORT")
print("="*80)
# --- 4. Main Execution Block ---
# This is where we create an instance of our analyzer and run all the methods
# in a logical sequence.
# Create the analyzer object. This will automatically load and prepare the data.
analyzer = ParasympatheticAnalyzer()
# Check if data was loaded successfully before proceeding.
if analyzer.data is not None:
# Run all the statistical tests and store the results.
analyzer.perform_statistical_analysis()
# Generate and display the plots.
analyzer.plot_longitudinal_trends()
analyzer.plot_mission_phase_analysis()
# Generate and print the final, formatted text report.
analyzer.generate_scientific_report()
else:
print("\nAnalysis could not proceed because data failed to load.")
--- Loading and Preparing Data ---
✓ Data loaded successfully: 37 recordings found.
✓ Crew members found: 8
✓ Mission Day (Sol) range: 2 to 16
--- Performing Comprehensive Statistical Analysis ---
[1] One-Way ANOVA: Testing for differences between crew members...
- RMSSD : F=3.400, p=0.0090 (**)
- PNNI_50 : F=2.485, p=0.0396 (*)
- PNNI_20 : F=2.354, p=0.0493 (*)
- HF : F=2.320, p=0.0523 (ns (not significant))
- HFNU : F=3.547, p=0.0071 (**)
- SD1 : F=3.399, p=0.0090 (**)
[2] Correlation Analysis: Testing for trends over the mission duration (Sol)...
- RMSSD : Spearman r=-0.279 (p=0.0945), Pearson r=-0.118 (p=0.4856)
- PNNI_50 : Spearman r=-0.308 (p=0.0640), Pearson r=-0.099 (p=0.5607)
- PNNI_20 : Spearman r=-0.241 (p=0.1500), Pearson r=-0.135 (p=0.4271)
- HF : Spearman r=-0.214 (p=0.2033), Pearson r=-0.001 (p=0.9933)
- HFNU : Spearman r=-0.318 (p=0.0553), Pearson r=-0.279 (p=0.0946)
- SD1 : Spearman r=-0.278 (p=0.0962), Pearson r=-0.118 (p=0.4855)
[3] Post-Hoc Analysis (Tukey's HSD): Pinpointing specific crew differences...
- Tukey HSD results for RMSSD (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -2.56 0.9993 -17.9577 12.8377 False
T01_Mara T03_Nancy 0.285 1.0 -13.6428 14.2128 False
T01_Mara T04_Michelle -3.1225 0.9952 -17.0503 10.8053 False
T01_Mara T05_Felicitas 13.15 0.0293 0.8669 25.4331 True
T01_Mara T06_Mara_Selena -1.345 1.0 -13.6281 10.9381 False
T01_Mara T07_Geraldinn -2.105 0.9996 -16.0328 11.8228 False
T01_Mara T08_Karina -5.495 0.9713 -23.4757 12.4857 False
T02_Laura T03_Nancy 2.845 0.9993 -14.526 20.216 False
T02_Laura T04_Michelle -0.5625 1.0 -17.9335 16.8085 False
T02_Laura T05_Felicitas 15.71 0.0593 -0.3724 31.7924 False
T02_Laura T06_Mara_Selena 1.215 1.0 -14.8674 17.2974 False
T02_Laura T07_Geraldinn 0.455 1.0 -16.916 17.826 False
T02_Laura T08_Karina -2.935 0.9997 -23.6973 17.8273 False
T03_Nancy T04_Michelle -3.4075 0.9966 -19.4899 12.6749 False
T03_Nancy T05_Felicitas 12.865 0.1193 -1.8161 27.5461 False
T03_Nancy T06_Mara_Selena -1.63 1.0 -16.3111 13.0511 False
T03_Nancy T07_Geraldinn -2.39 0.9997 -18.4724 13.6924 False
T03_Nancy T08_Karina -5.78 0.977 -25.4768 13.9168 False
T04_Michelle T05_Felicitas 16.2725 0.0218 1.5914 30.9536 True
T04_Michelle T06_Mara_Selena 1.7775 0.9999 -12.9036 16.4586 False
T04_Michelle T07_Geraldinn 1.0175 1.0 -15.0649 17.0999 False
T04_Michelle T08_Karina -2.3725 0.9999 -22.0693 17.3243 False
T05_Felicitas T06_Mara_Selena -14.495 0.0226 -27.6262 -1.3638 True
T05_Felicitas T07_Geraldinn -15.255 0.0373 -29.9361 -0.5739 True
T05_Felicitas T08_Karina -18.645 0.0485 -37.2153 -0.0747 True
T06_Mara_Selena T07_Geraldinn -0.76 1.0 -15.4411 13.9211 False
T06_Mara_Selena T08_Karina -4.15 0.9953 -22.7203 14.4203 False
T07_Geraldinn T08_Karina -3.39 0.9991 -23.0868 16.3068 False
-----------------------------------------------------------------------
- Tukey HSD results for PNNI_50 (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=====================================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------------
T01_Mara T02_Laura -0.1483 1.0 -4.063 3.7663 False
T01_Mara T03_Nancy 0.3775 1.0 -3.1634 3.9184 False
T01_Mara T04_Michelle -0.3125 1.0 -3.8534 3.2284 False
T01_Mara T05_Felicitas 3.225 0.0391 0.1022 6.3478 True
T01_Mara T06_Mara_Selena 0.2717 1.0 -2.8511 3.3945 False
T01_Mara T07_Geraldinn -0.2375 1.0 -3.7784 3.3034 False
T01_Mara T08_Karina -0.415 1.0 -4.9863 4.1563 False
T02_Laura T03_Nancy 0.5258 0.9999 -3.8905 4.9421 False
T02_Laura T04_Michelle -0.1642 1.0 -4.5805 4.2521 False
T02_Laura T05_Felicitas 3.3733 0.1658 -0.7154 7.462 False
T02_Laura T06_Mara_Selena 0.42 1.0 -3.6687 4.5087 False
T02_Laura T07_Geraldinn -0.0892 1.0 -4.5055 4.3271 False
T02_Laura T08_Karina -0.2667 1.0 -5.5451 5.0118 False
T03_Nancy T04_Michelle -0.69 0.9992 -4.7787 3.3987 False
T03_Nancy T05_Felicitas 2.8475 0.2399 -0.8849 6.5799 False
T03_Nancy T06_Mara_Selena -0.1058 1.0 -3.8383 3.6266 False
T03_Nancy T07_Geraldinn -0.615 0.9996 -4.7037 3.4737 False
T03_Nancy T08_Karina -0.7925 0.9995 -5.8001 4.2151 False
T04_Michelle T05_Felicitas 3.5375 0.073 -0.1949 7.2699 False
T04_Michelle T06_Mara_Selena 0.5842 0.9995 -3.1483 4.3166 False
T04_Michelle T07_Geraldinn 0.075 1.0 -4.0137 4.1637 False
T04_Michelle T08_Karina -0.1025 1.0 -5.1101 4.9051 False
T05_Felicitas T06_Mara_Selena -2.9533 0.1128 -6.2917 0.3851 False
T05_Felicitas T07_Geraldinn -3.4625 0.0841 -7.1949 0.2699 False
T05_Felicitas T08_Karina -3.64 0.2291 -8.3612 1.0812 False
T06_Mara_Selena T07_Geraldinn -0.5092 0.9998 -4.2416 3.2233 False
T06_Mara_Selena T08_Karina -0.6867 0.9997 -5.4079 4.0345 False
T07_Geraldinn T08_Karina -0.1775 1.0 -5.1851 4.8301 False
---------------------------------------------------------------------
- Tukey HSD results for PNNI_20 (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura 0.6396 1.0 -10.921 12.2002 False
T01_Mara T03_Nancy 3.3688 0.9619 -7.0882 13.8257 False
T01_Mara T04_Michelle 0.2713 1.0 -10.1857 10.7282 False
T01_Mara T05_Felicitas 9.8429 0.03 0.6207 19.0651 True
T01_Mara T06_Mara_Selena 1.5379 0.9993 -7.6843 10.7601 False
T01_Mara T07_Geraldinn 1.3563 0.9999 -9.1007 11.8132 False
T01_Mara T08_Karina -1.6837 0.9999 -15.1836 11.8161 False
T02_Laura T03_Nancy 2.7292 0.9969 -10.313 15.7713 False
T02_Laura T04_Michelle -0.3683 1.0 -13.4105 12.6738 False
T02_Laura T05_Felicitas 9.2033 0.2408 -2.8713 21.278 False
T02_Laura T06_Mara_Selena 0.8983 1.0 -11.1763 12.973 False
T02_Laura T07_Geraldinn 0.7167 1.0 -12.3255 13.7588 False
T02_Laura T08_Karina -2.3233 0.9996 -17.9117 13.265 False
T03_Nancy T04_Michelle -3.0975 0.9893 -15.1722 8.9772 False
T03_Nancy T05_Felicitas 6.4742 0.5518 -4.5484 17.4968 False
T03_Nancy T06_Mara_Selena -1.8308 0.9993 -12.8534 9.1918 False
T03_Nancy T07_Geraldinn -2.0125 0.9993 -14.0872 10.0622 False
T03_Nancy T08_Karina -5.0525 0.9483 -19.8409 9.7359 False
T04_Michelle T05_Felicitas 9.5717 0.1257 -1.4509 20.5943 False
T04_Michelle T06_Mara_Selena 1.2667 0.9999 -9.7559 12.2893 False
T04_Michelle T07_Geraldinn 1.085 1.0 -10.9897 13.1597 False
T04_Michelle T08_Karina -1.955 0.9998 -16.7434 12.8334 False
T05_Felicitas T06_Mara_Selena -8.305 0.1487 -18.1639 1.5539 False
T05_Felicitas T07_Geraldinn -8.4867 0.2305 -19.5093 2.5359 False
T05_Felicitas T08_Karina -11.5267 0.164 -25.4693 2.416 False
T06_Mara_Selena T07_Geraldinn -0.1817 1.0 -11.2043 10.8409 False
T06_Mara_Selena T08_Karina -3.2217 0.9943 -17.1643 10.721 False
T07_Geraldinn T08_Karina -3.04 0.9972 -17.8284 11.7484 False
-----------------------------------------------------------------------
- Tukey HSD results for HFNU (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -2.7446 0.8683 -9.3362 3.847 False
T01_Mara T03_Nancy -4.5238 0.2456 -10.4861 1.4386 False
T01_Mara T04_Michelle -6.6712 0.02 -12.6336 -0.7089 True
T01_Mara T05_Felicitas -2.1363 0.8818 -7.3945 3.122 False
T01_Mara T06_Mara_Selena -6.3529 0.0097 -11.6112 -1.0947 True
T01_Mara T07_Geraldinn -4.8713 0.1742 -10.8336 1.0911 False
T01_Mara T08_Karina -5.7813 0.2563 -13.4786 1.9161 False
T02_Laura T03_Nancy -1.7792 0.9929 -9.2155 5.6571 False
T02_Laura T04_Michelle -3.9267 0.6735 -11.363 3.5096 False
T02_Laura T05_Felicitas 0.6083 1.0 -6.2764 7.493 False
T02_Laura T06_Mara_Selena -3.6083 0.6814 -10.493 3.2764 False
T02_Laura T07_Geraldinn -2.1267 0.9801 -9.563 5.3096 False
T02_Laura T08_Karina -3.0367 0.9483 -11.9248 5.8514 False
T03_Nancy T04_Michelle -2.1475 0.9679 -9.0322 4.7372 False
T03_Nancy T05_Felicitas 2.3875 0.9131 -3.8973 8.6723 False
T03_Nancy T06_Mara_Selena -1.8292 0.978 -8.114 4.4557 False
T03_Nancy T07_Geraldinn -0.3475 1.0 -7.2322 6.5372 False
T03_Nancy T08_Karina -1.2575 0.9996 -9.6895 7.1745 False
T04_Michelle T05_Felicitas 4.535 0.3007 -1.7498 10.8198 False
T04_Michelle T06_Mara_Selena 0.3183 1.0 -5.9665 6.6032 False
T04_Michelle T07_Geraldinn 1.8 0.988 -5.0847 8.6847 False
T04_Michelle T08_Karina 0.89 1.0 -7.542 9.322 False
T05_Felicitas T06_Mara_Selena -4.2167 0.2577 -9.838 1.4047 False
T05_Felicitas T07_Geraldinn -2.735 0.841 -9.0198 3.5498 False
T05_Felicitas T08_Karina -3.645 0.8036 -11.5947 4.3047 False
T06_Mara_Selena T07_Geraldinn 1.4817 0.9935 -4.8032 7.7665 False
T06_Mara_Selena T08_Karina 0.5717 1.0 -7.3781 8.5214 False
T07_Geraldinn T08_Karina -0.91 1.0 -9.342 7.522 False
-----------------------------------------------------------------------
- Tukey HSD results for SD1 (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -1.8121 0.9993 -12.7001 9.0759 False
T01_Mara T03_Nancy 0.2012 1.0 -9.6473 10.0498 False
T01_Mara T04_Michelle -2.2062 0.9952 -12.0548 7.6423 False
T01_Mara T05_Felicitas 9.2979 0.0293 0.6123 17.9835 True
T01_Mara T06_Mara_Selena -0.9521 1.0 -9.6377 7.7335 False
T01_Mara T07_Geraldinn -1.4888 0.9996 -11.3373 8.3598 False
T01_Mara T08_Karina -3.8837 0.9713 -16.5982 8.8307 False
T02_Laura T03_Nancy 2.0133 0.9993 -10.27 14.2966 False
T02_Laura T04_Michelle -0.3942 1.0 -12.6775 11.8891 False
T02_Laura T05_Felicitas 11.11 0.0592 -0.2621 22.4821 False
T02_Laura T06_Mara_Selena 0.86 1.0 -10.5121 12.2321 False
T02_Laura T07_Geraldinn 0.3233 1.0 -11.96 12.6066 False
T02_Laura T08_Karina -2.0717 0.9998 -16.753 12.6097 False
T03_Nancy T04_Michelle -2.4075 0.9966 -13.7796 8.9646 False
T03_Nancy T05_Felicitas 9.0967 0.1193 -1.2846 19.4779 False
T03_Nancy T06_Mara_Selena -1.1533 1.0 -11.5346 9.2279 False
T03_Nancy T07_Geraldinn -1.69 0.9997 -13.0621 9.6821 False
T03_Nancy T08_Karina -4.085 0.9771 -18.0129 9.8429 False
T04_Michelle T05_Felicitas 11.5042 0.0218 1.1229 21.8854 True
T04_Michelle T06_Mara_Selena 1.2542 0.9999 -9.1271 11.6354 False
T04_Michelle T07_Geraldinn 0.7175 1.0 -10.6546 12.0896 False
T04_Michelle T08_Karina -1.6775 0.9999 -15.6054 12.2504 False
T05_Felicitas T06_Mara_Selena -10.25 0.0226 -19.5353 -0.9647 True
T05_Felicitas T07_Geraldinn -10.7867 0.0373 -21.1679 -0.4054 True
T05_Felicitas T08_Karina -13.1817 0.0486 -26.3131 -0.0503 True
T06_Mara_Selena T07_Geraldinn -0.5367 1.0 -10.9179 9.8446 False
T06_Mara_Selena T08_Karina -2.9317 0.9953 -16.0631 10.1997 False
T07_Geraldinn T08_Karina -2.395 0.9991 -16.3229 11.5329 False
-----------------------------------------------------------------------
✓ Statistical analysis complete.
--- Plotting Longitudinal Trends ---
--- Plotting Mission Phase Analysis ---
================================================================================
COMPREHENSIVE PARASYMPATHETIC NERVOUS SYSTEM REPORT
================================================================================
### 1. INTRODUCTION ###
This report details the analysis of parasympathetic nervous system (PNS) activity in the space analog crew.
PNS activity, often termed the 'rest-and-digest' system, is a critical indicator of stress, recovery, and autonomic health.
The analysis utilizes established Heart Rate Variability (HRV) metrics to quantify PNS tone.
### 2. METHODOLOGY ###
The following validated HRV metrics were used to assess parasympathetic activity:
- RMSSD & SD1: Reflect short-term, beat-to-beat variability (vagal tone).
- pNN50/pNN20: Percentage of successive beats that differ by more than 50ms/20ms.
- HF/HFnu: High-frequency power, a direct marker of vagal modulation of the heart.
Statistical analyses included One-Way ANOVA, Tukey's HSD for pairwise comparisons, and Spearman/Pearson correlation to assess trends over time.
### 3. STATISTICAL RESULTS ###
--- Between-Crew Differences (ANOVA) ---
This test checks if there are significant differences in the average metric values among crew members over the entire mission.
- RMSSD: F=3.40, p=0.009 (p < 0.05, SIGNIFICANT)
- PNNI_50: F=2.49, p=0.040 (p < 0.05, SIGNIFICANT)
- PNNI_20: F=2.35, p=0.049 (p < 0.05, SIGNIFICANT)
- HF: F=2.32, p=0.052 (p > 0.05, not significant)
- HFNU: F=3.55, p=0.007 (p < 0.05, SIGNIFICANT)
- SD1: F=3.40, p=0.009 (p < 0.05, SIGNIFICANT)
--- Mission Time Trend Analysis (Correlation) ---
This test checks if metrics generally increased or decreased over the course of the mission for the crew as a whole.
- RMSSD: Spearman r=-0.279 (p=0.094, p > 0.05, not significant)
- PNNI_50: Spearman r=-0.308 (p=0.064, p > 0.05, not significant)
- PNNI_20: Spearman r=-0.241 (p=0.150, p > 0.05, not significant)
- HF: Spearman r=-0.214 (p=0.203, p > 0.05, not significant)
- HFNU: Spearman r=-0.318 (p=0.055, p > 0.05, not significant)
- SD1: Spearman r=-0.278 (p=0.096, p > 0.05, not significant)
--- Pairwise Crew Comparisons (Tukey's HSD) ---
For metrics where ANOVA was significant, this test identifies which specific crew members differed from each other.
Significant differences for RMSSD:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -2.56 0.9993 -17.9577 12.8377 False
T01_Mara T03_Nancy 0.285 1.0 -13.6428 14.2128 False
T01_Mara T04_Michelle -3.1225 0.9952 -17.0503 10.8053 False
T01_Mara T05_Felicitas 13.15 0.0293 0.8669 25.4331 True
T01_Mara T06_Mara_Selena -1.345 1.0 -13.6281 10.9381 False
T01_Mara T07_Geraldinn -2.105 0.9996 -16.0328 11.8228 False
T01_Mara T08_Karina -5.495 0.9713 -23.4757 12.4857 False
T02_Laura T03_Nancy 2.845 0.9993 -14.526 20.216 False
T02_Laura T04_Michelle -0.5625 1.0 -17.9335 16.8085 False
T02_Laura T05_Felicitas 15.71 0.0593 -0.3724 31.7924 False
T02_Laura T06_Mara_Selena 1.215 1.0 -14.8674 17.2974 False
T02_Laura T07_Geraldinn 0.455 1.0 -16.916 17.826 False
T02_Laura T08_Karina -2.935 0.9997 -23.6973 17.8273 False
T03_Nancy T04_Michelle -3.4075 0.9966 -19.4899 12.6749 False
T03_Nancy T05_Felicitas 12.865 0.1193 -1.8161 27.5461 False
T03_Nancy T06_Mara_Selena -1.63 1.0 -16.3111 13.0511 False
T03_Nancy T07_Geraldinn -2.39 0.9997 -18.4724 13.6924 False
T03_Nancy T08_Karina -5.78 0.977 -25.4768 13.9168 False
T04_Michelle T05_Felicitas 16.2725 0.0218 1.5914 30.9536 True
T04_Michelle T06_Mara_Selena 1.7775 0.9999 -12.9036 16.4586 False
T04_Michelle T07_Geraldinn 1.0175 1.0 -15.0649 17.0999 False
T04_Michelle T08_Karina -2.3725 0.9999 -22.0693 17.3243 False
T05_Felicitas T06_Mara_Selena -14.495 0.0226 -27.6262 -1.3638 True
T05_Felicitas T07_Geraldinn -15.255 0.0373 -29.9361 -0.5739 True
T05_Felicitas T08_Karina -18.645 0.0485 -37.2153 -0.0747 True
T06_Mara_Selena T07_Geraldinn -0.76 1.0 -15.4411 13.9211 False
T06_Mara_Selena T08_Karina -4.15 0.9953 -22.7203 14.4203 False
T07_Geraldinn T08_Karina -3.39 0.9991 -23.0868 16.3068 False
-----------------------------------------------------------------------
Significant differences for PNNI_50:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=====================================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------------
T01_Mara T02_Laura -0.1483 1.0 -4.063 3.7663 False
T01_Mara T03_Nancy 0.3775 1.0 -3.1634 3.9184 False
T01_Mara T04_Michelle -0.3125 1.0 -3.8534 3.2284 False
T01_Mara T05_Felicitas 3.225 0.0391 0.1022 6.3478 True
T01_Mara T06_Mara_Selena 0.2717 1.0 -2.8511 3.3945 False
T01_Mara T07_Geraldinn -0.2375 1.0 -3.7784 3.3034 False
T01_Mara T08_Karina -0.415 1.0 -4.9863 4.1563 False
T02_Laura T03_Nancy 0.5258 0.9999 -3.8905 4.9421 False
T02_Laura T04_Michelle -0.1642 1.0 -4.5805 4.2521 False
T02_Laura T05_Felicitas 3.3733 0.1658 -0.7154 7.462 False
T02_Laura T06_Mara_Selena 0.42 1.0 -3.6687 4.5087 False
T02_Laura T07_Geraldinn -0.0892 1.0 -4.5055 4.3271 False
T02_Laura T08_Karina -0.2667 1.0 -5.5451 5.0118 False
T03_Nancy T04_Michelle -0.69 0.9992 -4.7787 3.3987 False
T03_Nancy T05_Felicitas 2.8475 0.2399 -0.8849 6.5799 False
T03_Nancy T06_Mara_Selena -0.1058 1.0 -3.8383 3.6266 False
T03_Nancy T07_Geraldinn -0.615 0.9996 -4.7037 3.4737 False
T03_Nancy T08_Karina -0.7925 0.9995 -5.8001 4.2151 False
T04_Michelle T05_Felicitas 3.5375 0.073 -0.1949 7.2699 False
T04_Michelle T06_Mara_Selena 0.5842 0.9995 -3.1483 4.3166 False
T04_Michelle T07_Geraldinn 0.075 1.0 -4.0137 4.1637 False
T04_Michelle T08_Karina -0.1025 1.0 -5.1101 4.9051 False
T05_Felicitas T06_Mara_Selena -2.9533 0.1128 -6.2917 0.3851 False
T05_Felicitas T07_Geraldinn -3.4625 0.0841 -7.1949 0.2699 False
T05_Felicitas T08_Karina -3.64 0.2291 -8.3612 1.0812 False
T06_Mara_Selena T07_Geraldinn -0.5092 0.9998 -4.2416 3.2233 False
T06_Mara_Selena T08_Karina -0.6867 0.9997 -5.4079 4.0345 False
T07_Geraldinn T08_Karina -0.1775 1.0 -5.1851 4.8301 False
---------------------------------------------------------------------
Significant differences for PNNI_20:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura 0.6396 1.0 -10.921 12.2002 False
T01_Mara T03_Nancy 3.3688 0.9619 -7.0882 13.8257 False
T01_Mara T04_Michelle 0.2713 1.0 -10.1857 10.7282 False
T01_Mara T05_Felicitas 9.8429 0.03 0.6207 19.0651 True
T01_Mara T06_Mara_Selena 1.5379 0.9993 -7.6843 10.7601 False
T01_Mara T07_Geraldinn 1.3563 0.9999 -9.1007 11.8132 False
T01_Mara T08_Karina -1.6837 0.9999 -15.1836 11.8161 False
T02_Laura T03_Nancy 2.7292 0.9969 -10.313 15.7713 False
T02_Laura T04_Michelle -0.3683 1.0 -13.4105 12.6738 False
T02_Laura T05_Felicitas 9.2033 0.2408 -2.8713 21.278 False
T02_Laura T06_Mara_Selena 0.8983 1.0 -11.1763 12.973 False
T02_Laura T07_Geraldinn 0.7167 1.0 -12.3255 13.7588 False
T02_Laura T08_Karina -2.3233 0.9996 -17.9117 13.265 False
T03_Nancy T04_Michelle -3.0975 0.9893 -15.1722 8.9772 False
T03_Nancy T05_Felicitas 6.4742 0.5518 -4.5484 17.4968 False
T03_Nancy T06_Mara_Selena -1.8308 0.9993 -12.8534 9.1918 False
T03_Nancy T07_Geraldinn -2.0125 0.9993 -14.0872 10.0622 False
T03_Nancy T08_Karina -5.0525 0.9483 -19.8409 9.7359 False
T04_Michelle T05_Felicitas 9.5717 0.1257 -1.4509 20.5943 False
T04_Michelle T06_Mara_Selena 1.2667 0.9999 -9.7559 12.2893 False
T04_Michelle T07_Geraldinn 1.085 1.0 -10.9897 13.1597 False
T04_Michelle T08_Karina -1.955 0.9998 -16.7434 12.8334 False
T05_Felicitas T06_Mara_Selena -8.305 0.1487 -18.1639 1.5539 False
T05_Felicitas T07_Geraldinn -8.4867 0.2305 -19.5093 2.5359 False
T05_Felicitas T08_Karina -11.5267 0.164 -25.4693 2.416 False
T06_Mara_Selena T07_Geraldinn -0.1817 1.0 -11.2043 10.8409 False
T06_Mara_Selena T08_Karina -3.2217 0.9943 -17.1643 10.721 False
T07_Geraldinn T08_Karina -3.04 0.9972 -17.8284 11.7484 False
-----------------------------------------------------------------------
Significant differences for HFNU:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -2.7446 0.8683 -9.3362 3.847 False
T01_Mara T03_Nancy -4.5238 0.2456 -10.4861 1.4386 False
T01_Mara T04_Michelle -6.6712 0.02 -12.6336 -0.7089 True
T01_Mara T05_Felicitas -2.1363 0.8818 -7.3945 3.122 False
T01_Mara T06_Mara_Selena -6.3529 0.0097 -11.6112 -1.0947 True
T01_Mara T07_Geraldinn -4.8713 0.1742 -10.8336 1.0911 False
T01_Mara T08_Karina -5.7813 0.2563 -13.4786 1.9161 False
T02_Laura T03_Nancy -1.7792 0.9929 -9.2155 5.6571 False
T02_Laura T04_Michelle -3.9267 0.6735 -11.363 3.5096 False
T02_Laura T05_Felicitas 0.6083 1.0 -6.2764 7.493 False
T02_Laura T06_Mara_Selena -3.6083 0.6814 -10.493 3.2764 False
T02_Laura T07_Geraldinn -2.1267 0.9801 -9.563 5.3096 False
T02_Laura T08_Karina -3.0367 0.9483 -11.9248 5.8514 False
T03_Nancy T04_Michelle -2.1475 0.9679 -9.0322 4.7372 False
T03_Nancy T05_Felicitas 2.3875 0.9131 -3.8973 8.6723 False
T03_Nancy T06_Mara_Selena -1.8292 0.978 -8.114 4.4557 False
T03_Nancy T07_Geraldinn -0.3475 1.0 -7.2322 6.5372 False
T03_Nancy T08_Karina -1.2575 0.9996 -9.6895 7.1745 False
T04_Michelle T05_Felicitas 4.535 0.3007 -1.7498 10.8198 False
T04_Michelle T06_Mara_Selena 0.3183 1.0 -5.9665 6.6032 False
T04_Michelle T07_Geraldinn 1.8 0.988 -5.0847 8.6847 False
T04_Michelle T08_Karina 0.89 1.0 -7.542 9.322 False
T05_Felicitas T06_Mara_Selena -4.2167 0.2577 -9.838 1.4047 False
T05_Felicitas T07_Geraldinn -2.735 0.841 -9.0198 3.5498 False
T05_Felicitas T08_Karina -3.645 0.8036 -11.5947 4.3047 False
T06_Mara_Selena T07_Geraldinn 1.4817 0.9935 -4.8032 7.7665 False
T06_Mara_Selena T08_Karina 0.5717 1.0 -7.3781 8.5214 False
T07_Geraldinn T08_Karina -0.91 1.0 -9.342 7.522 False
-----------------------------------------------------------------------
Significant differences for SD1:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -1.8121 0.9993 -12.7001 9.0759 False
T01_Mara T03_Nancy 0.2012 1.0 -9.6473 10.0498 False
T01_Mara T04_Michelle -2.2062 0.9952 -12.0548 7.6423 False
T01_Mara T05_Felicitas 9.2979 0.0293 0.6123 17.9835 True
T01_Mara T06_Mara_Selena -0.9521 1.0 -9.6377 7.7335 False
T01_Mara T07_Geraldinn -1.4888 0.9996 -11.3373 8.3598 False
T01_Mara T08_Karina -3.8837 0.9713 -16.5982 8.8307 False
T02_Laura T03_Nancy 2.0133 0.9993 -10.27 14.2966 False
T02_Laura T04_Michelle -0.3942 1.0 -12.6775 11.8891 False
T02_Laura T05_Felicitas 11.11 0.0592 -0.2621 22.4821 False
T02_Laura T06_Mara_Selena 0.86 1.0 -10.5121 12.2321 False
T02_Laura T07_Geraldinn 0.3233 1.0 -11.96 12.6066 False
T02_Laura T08_Karina -2.0717 0.9998 -16.753 12.6097 False
T03_Nancy T04_Michelle -2.4075 0.9966 -13.7796 8.9646 False
T03_Nancy T05_Felicitas 9.0967 0.1193 -1.2846 19.4779 False
T03_Nancy T06_Mara_Selena -1.1533 1.0 -11.5346 9.2279 False
T03_Nancy T07_Geraldinn -1.69 0.9997 -13.0621 9.6821 False
T03_Nancy T08_Karina -4.085 0.9771 -18.0129 9.8429 False
T04_Michelle T05_Felicitas 11.5042 0.0218 1.1229 21.8854 True
T04_Michelle T06_Mara_Selena 1.2542 0.9999 -9.1271 11.6354 False
T04_Michelle T07_Geraldinn 0.7175 1.0 -10.6546 12.0896 False
T04_Michelle T08_Karina -1.6775 0.9999 -15.6054 12.2504 False
T05_Felicitas T06_Mara_Selena -10.25 0.0226 -19.5353 -0.9647 True
T05_Felicitas T07_Geraldinn -10.7867 0.0373 -21.1679 -0.4054 True
T05_Felicitas T08_Karina -13.1817 0.0486 -26.3131 -0.0503 True
T06_Mara_Selena T07_Geraldinn -0.5367 1.0 -10.9179 9.8446 False
T06_Mara_Selena T08_Karina -2.9317 0.9953 -16.0631 10.1997 False
T07_Geraldinn T08_Karina -2.395 0.9991 -16.3229 11.5329 False
-----------------------------------------------------------------------
================================================================================
END OF REPORT
================================================================================
# =================================================================================
# Cell: Comprehensive Parasympathetic and GAM Time Series Analysis
# =================================================================================
# This notebook cell provides a complete, in-depth analysis of parasympathetic
# nervous system activity, including Generalized Additive Models (GAMs)
# to capture non-linear trends over time.
#
# The analysis is based on scientifically validated Heart Rate Variability (HRV) metrics.
#
# It will perform the following steps:
# 1. Load and prepare the complete HRV dataset from the specified hardcoded path.
# 2. Perform detailed statistical analysis (ANOVA, Correlation, Tukey's HSD).
# 3. Fit a Generalized Additive Model (GAM) for each metric to visualize
# non-linear changes over the mission.
# 4. Generate and display comprehensive visualizations for all analyses.
# 5. Generate and print a full scientific report summarizing all findings.
# =================================================================================
# --- 1. Imports and Setup ---
# Import all necessary libraries for data handling, statistics, and plotting.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import f_oneway, pearsonr, spearmanr
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from pygam import GAM, s, f # Import for Generalized Additive Models
import warnings
# Ignore common warnings for a cleaner, more readable output.
warnings.filterwarnings('ignore')
# --- 2. Plotting Style Configuration ---
# Set a professional and consistent style for all plots.
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl", 8)
plt.rcParams['figure.figsize'] = (16, 10)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 11
plt.rcParams['figure.dpi'] = 100
# --- 3. The ParasympatheticAnalyzer Class ---
# The entire analysis is encapsulated within this class for organization and reusability.
class ParasympatheticAnalyzer:
"""
A comprehensive analyzer for parasympathetic nervous system activity
in space crew members using validated HRV metrics and advanced modeling.
"""
# Using the exact hardcoded path as requested.
def __init__(self, data_path=(r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv")):
"""
Initializes the analyzer by loading and preparing the HRV data from the specified path.
"""
self.data_path = data_path
self.data = None
self.parasympathetic_metrics = ['rmssd', 'pnni_50', 'pnni_20', 'hf', 'hfnu', 'sd1']
self.crew_names = {
'T01_Mara': 'Mara', 'T02_Laura': 'Laura', 'T03_Nancy': 'Nancy',
'T04_Michelle': 'Michelle', 'T05_Felicitas': 'Felicitas',
'T06_Mara_Selena': 'Mara Selena', 'T07_Geraldinn': 'Geraldinn', 'T08_Karina': 'Karina'
}
self.statistical_results = {}
self.load_data()
def load_data(self):
"""
Loads the data from the CSV file and prepares it for all analyses.
This now includes creating a numerical subject code required for pygam.
"""
print("--- Loading and Preparing Data ---")
try:
self.data = pd.read_csv(self.data_path)
self.data['Subject'] = self.data['Subject'].astype(str)
self.data['Sol'] = self.data['Sol'].astype(int)
self.data['Crew_Name'] = self.data['Subject'].map(self.crew_names)
self.data = self.data.sort_values(['Subject', 'Sol'])
self.data['Mission_Phase'] = self.data['Sol'].apply(self._categorize_mission_phase)
self.data['Subject_code'] = self.data['Subject'].astype('category').cat.codes
print(f"✓ Data loaded successfully from {self.data_path}")
print(f"✓ Data shape: {self.data.shape}")
print(f"✓ Crew members found: {len(self.data['Subject'].unique())}")
print(f"✓ Mission Day (Sol) range: {self.data['Sol'].min()} to {self.data['Sol'].max()}")
except FileNotFoundError:
print(f"✗ ERROR: Data file not found at the specified path: '{self.data_path}'")
print(" Please ensure the path is correct.")
except Exception as e:
print(f"✗ An unexpected error occurred during data loading: {e}")
def _categorize_mission_phase(self, sol):
"""Helper function to categorize mission days into distinct phases."""
if sol <= 5:
return 'Early Mission (Sols 1-5)'
elif sol <= 10:
return 'Mid Mission (Sols 6-10)'
else:
return 'Late Mission (Sols 11+)'
def perform_statistical_analysis(self):
"""Performs ANOVA, Correlation, and Tukey's HSD tests."""
print("\n\n--- Performing Comprehensive Statistical Analysis ---")
results = {}
print("\n[1] One-Way ANOVA: Testing for differences between crew members...")
for metric in self.parasympathetic_metrics:
if metric in self.data.columns:
groups = [self.data[self.data['Subject'] == crew][metric].dropna() for crew in self.data['Subject'].unique()]
groups = [g for g in groups if len(g) > 0]
if len(groups) >= 2:
f_stat, p_value = f_oneway(*groups)
results[f'anova_{metric}'] = {'F-statistic': f_stat, 'p-value': p_value}
sig_marker = '***' if p_value < 0.001 else '**' if p_value < 0.01 else '*' if p_value < 0.05 else 'ns (not significant)'
print(f" - {metric.upper():<8}: F={f_stat:.3f}, p={p_value:.4f} ({sig_marker})")
print("\n[2] Correlation Analysis: Testing for trends over the mission duration (Sol)...")
for metric in self.parasympathetic_metrics:
if metric in self.data.columns:
clean_data = self.data[['Sol', metric]].dropna()
if len(clean_data) > 3:
r_pearson, p_pearson = pearsonr(clean_data['Sol'], clean_data[metric])
r_spearman, p_spearman = spearmanr(clean_data['Sol'], clean_data[metric])
results[f'corr_{metric}'] = {'pearson_r': r_pearson, 'pearson_p': p_pearson, 'spearman_r': r_spearman, 'spearman_p': p_spearman}
print(f" - {metric.upper():<8}: Spearman r={r_spearman:.3f} (p={p_spearman:.4f}), Pearson r={r_pearson:.3f} (p={p_pearson:.4f})")
print("\n[3] Post-Hoc Analysis (Tukey's HSD): Pinpointing specific crew differences...")
for metric in self.parasympathetic_metrics:
if f'anova_{metric}' in results and results[f'anova_{metric}']['p-value'] < 0.05:
print(f"\n - Tukey HSD results for {metric.upper()} (since ANOVA was significant):")
clean_data = self.data[[metric, 'Subject']].dropna()
tukey_result = pairwise_tukeyhsd(endog=clean_data[metric], groups=clean_data['Subject'], alpha=0.05)
results[f'tukey_{metric}'] = tukey_result
print(tukey_result)
self.statistical_results = results
print("\n✓ Statistical analysis complete.")
def run_generalized_additive_models(self):
"""
Fits, summarizes, and plots a Generalized Additive Model for each metric.
GAMs are powerful tools that can model complex, non-linear relationships.
"""
print("\n\n--- Running Generalized Additive Models (GAMs) ---")
print("This analysis captures non-linear trends over the mission duration.")
for metric in self.parasympathetic_metrics:
if metric not in self.data.columns:
print(f"\n--- Skipping {metric.upper()}: Column not found ---")
continue
print(f"\n--- Analyzing Metric: {metric.upper()} ---")
try:
model_data = self.data[['Sol', 'Subject', 'Subject_code', metric]].dropna()
if len(model_data['Subject_code'].unique()) < 2 or len(model_data) < 15:
print("Not enough data to robustly fit the model.")
continue
X = model_data[['Sol', 'Subject_code']]
y = model_data[metric]
gam = GAM(s(0, n_splines=10) + f(1)).fit(X, y)
print(gam.summary())
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
XX = gam.generate_X_grid(term=0)
pdep, confi = gam.partial_dependence(term=0, X=XX, width=0.95)
ax.plot(XX[:, 0], pdep, color='royalblue', linewidth=3, label='GAM Trend')
ax.fill_between(XX[:, 0], confi[:, 0], confi[:, 1], color='cornflowerblue', alpha=0.3, label='95% Confidence Interval')
sns.scatterplot(x='Sol', y=metric, hue='Subject', data=model_data, ax=ax, alpha=0.7, palette='husl', s=50)
ax.set_title(f"Non-Linear Trend for {metric.upper()} (GAM Analysis)", fontsize=16, fontweight='bold')
ax.set_xlabel("Sol (Mission Day)")
ax.set_ylabel(f"{metric.upper()} Value")
ax.grid(True, which='both', linestyle='--', linewidth=0.5)
ax.legend(title='Subject', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
except Exception as e:
print(f"Could not fit GAM for {metric}. An unexpected error occurred: {e}")
def plot_longitudinal_trends(self):
"""Creates longitudinal trend plots for each metric (linear trend)."""
print("\n--- Plotting Longitudinal Trends (Linear Fit) ---")
fig, axes = plt.subplots(3, 2, figsize=(20, 22))
axes = axes.flatten()
for i, metric in enumerate(self.parasympathetic_metrics):
ax = axes[i]
for subject_id, crew_data in self.data.groupby('Subject'):
ax.plot(crew_data['Sol'], crew_data[metric], marker='o', linestyle='-', alpha=0.8, label=self.crew_names[subject_id])
if len(crew_data['Sol'].dropna()) > 1:
z = np.polyfit(crew_data['Sol'].dropna(), crew_data[metric].dropna(), 1)
p = np.poly1d(z)
ax.plot(crew_data['Sol'], p(crew_data['Sol']), linestyle='--', alpha=0.6, color=ax.get_lines()[-1].get_color())
ax.set_title(f'Longitudinal Trend: {metric.upper()}', fontweight='bold')
ax.set_xlabel('Sol (Mission Day)')
ax.set_ylabel(f'{metric.upper()} Value')
ax.grid(True, which='both', linestyle='--', linewidth=0.5)
handles, labels = axes[0].get_legend_handles_labels()
unique_handles = [handles[i] for i in range(0, len(handles), 2)]
unique_labels = [labels[i] for i in range(0, len(labels), 2)]
fig.legend(unique_handles, unique_labels, loc='upper right', bbox_to_anchor=(1.1, 0.95), title="Crew Member")
plt.tight_layout(rect=[0, 0, 0.9, 1])
plt.suptitle('Parasympathetic HRV Metrics Over Mission Time (Linear Fit)', fontsize=20, y=1.02)
plt.show()
def plot_mission_phase_analysis(self):
"""Analyzes and visualizes activity across mission phases."""
print("\n--- Plotting Mission Phase Analysis ---")
fig, axes = plt.subplots(3, 2, figsize=(20, 22))
axes = axes.flatten()
phase_order = ['Early Mission (Sols 1-5)', 'Mid Mission (Sols 6-10)', 'Late Mission (Sols 11+)']
for i, metric in enumerate(self.parasympathetic_metrics):
ax = axes[i]
sns.boxplot(data=self.data, x='Mission_Phase', y=metric, ax=ax, order=phase_order, showfliers=False)
sns.stripplot(data=self.data, x='Mission_Phase', y=metric, ax=ax, order=phase_order, color='black', alpha=0.5, size=4)
ax.set_title(f'Analysis by Mission Phase: {metric.upper()}', fontweight='bold')
ax.set_xlabel('Mission Phase')
ax.set_ylabel(f'{metric.upper()} Value')
ax.tick_params(axis='x', rotation=10)
plt.tight_layout()
plt.suptitle('Parasympathetic HRV Metrics Across Mission Phases', fontsize=20, y=1.02)
plt.show()
def generate_scientific_report(self):
"""Generates a formatted scientific report summarizing all findings."""
print("\n\n" + "="*80)
print(" COMPREHENSIVE PARASYMPATHETIC NERVOUS SYSTEM REPORT")
print("="*80)
report = [
"\n### 1. INTRODUCTION ###",
"This report details the analysis of parasympathetic nervous system (PNS) activity in the space analog crew.",
"PNS activity, often termed the 'rest-and-digest' system, is a critical indicator of stress, recovery, and autonomic health.",
"The analysis utilizes established Heart Rate Variability (HRV) metrics to quantify PNS tone.",
"\n### 2. METHODOLOGY ###",
"The following validated HRV metrics were used to assess parasympathetic activity:",
"- RMSSD & SD1: Reflect short-term, beat-to-beat variability (vagal tone).",
"- pNN50/pNN20: Percentage of successive beats that differ by more than 50ms/20ms.",
"- HF/HFnu: High-frequency power, a direct marker of vagal modulation of the heart.",
"\nStatistical analyses included:",
" 1. One-Way ANOVA and Tukey's HSD for between-crew comparisons.",
" 2. Spearman/Pearson correlation to assess linear trends over time.",
" 3. Generalized Additive Models (GAMs) to identify non-linear trends over time.",
"\n### 3. STATISTICAL RESULTS ###",
"\n--- Between-Crew Differences (ANOVA) ---",
"This test checks if there are significant differences in the average metric values among crew members over the entire mission.",
]
for metric in self.parasympathetic_metrics:
if f'anova_{metric}' in self.statistical_results:
res = self.statistical_results[f'anova_{metric}']
sig = 'p < 0.05, SIGNIFICANT' if res['p-value'] < 0.05 else 'p > 0.05, not significant'
report.append(f" - {metric.upper()}: F={res['F-statistic']:.2f}, p={res['p-value']:.3f} ({sig})")
report.append("\n--- Mission Time Trend Analysis (Correlation) ---")
report.append("This test checks if metrics generally increased or decreased over the course of the mission for the crew as a whole.")
for metric in self.parasympathetic_metrics:
if f'corr_{metric}' in self.statistical_results:
res = self.statistical_results[f'corr_{metric}']
sig = 'p < 0.05, SIGNIFICANT' if res['spearman_p'] < 0.05 else 'p > 0.05, not significant'
report.append(f" - {metric.upper()}: Spearman r={res['spearman_r']:.3f} (p={res['spearman_p']:.3f}, {sig})")
report.append("\n--- Non-Linear Trend Analysis (GAM) ---")
report.append("GAMs were used to explore complex, non-linear patterns in HRV metrics over the mission duration.")
report.append("The plots generated by this analysis visualize the average trend for the crew, revealing periods of increase, decrease, or stability that a simple linear model might miss.")
report.append("Interpretation should be based on visual inspection of the GAM plots: a significant curve or 'wiggle' suggests a non-linear relationship with time.")
report.append("\n--- Pairwise Crew Comparisons (Tukey's HSD) ---")
report.append("For metrics where ANOVA was significant, this test identifies which specific crew members differed from each other.")
for metric in self.parasympathetic_metrics:
if f'tukey_{metric}' in self.statistical_results:
report.append(f"\n Significant differences for {metric.upper()}:")
report.append(str(self.statistical_results[f'tukey_{metric}']))
print("\n".join(report))
print("\n" + "="*80)
print(" END OF REPORT")
print("="*80)
# --- 4. Main Execution Block ---
# Create an instance of the analyzer and run all analysis methods.
analyzer = ParasympatheticAnalyzer()
if analyzer.data is not None:
analyzer.perform_statistical_analysis()
analyzer.plot_longitudinal_trends()
analyzer.plot_mission_phase_analysis()
analyzer.run_generalized_additive_models() # Run the new GAM analysis
analyzer.generate_scientific_report()
else:
print("\nAnalysis could not proceed because data failed to load.")
--- Loading and Preparing Data ---
✓ Data loaded successfully from C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv
✓ Data shape: (37, 36)
✓ Crew members found: 8
✓ Mission Day (Sol) range: 2 to 16
--- Performing Comprehensive Statistical Analysis ---
[1] One-Way ANOVA: Testing for differences between crew members...
- RMSSD : F=3.400, p=0.0090 (**)
- PNNI_50 : F=2.485, p=0.0396 (*)
- PNNI_20 : F=2.354, p=0.0493 (*)
- HF : F=2.320, p=0.0523 (ns (not significant))
- HFNU : F=3.547, p=0.0071 (**)
- SD1 : F=3.399, p=0.0090 (**)
[2] Correlation Analysis: Testing for trends over the mission duration (Sol)...
- RMSSD : Spearman r=-0.279 (p=0.0945), Pearson r=-0.118 (p=0.4856)
- PNNI_50 : Spearman r=-0.308 (p=0.0640), Pearson r=-0.099 (p=0.5607)
- PNNI_20 : Spearman r=-0.241 (p=0.1500), Pearson r=-0.135 (p=0.4271)
- HF : Spearman r=-0.214 (p=0.2033), Pearson r=-0.001 (p=0.9933)
- HFNU : Spearman r=-0.318 (p=0.0553), Pearson r=-0.279 (p=0.0946)
- SD1 : Spearman r=-0.278 (p=0.0962), Pearson r=-0.118 (p=0.4855)
[3] Post-Hoc Analysis (Tukey's HSD): Pinpointing specific crew differences...
- Tukey HSD results for RMSSD (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -2.56 0.9993 -17.9577 12.8377 False
T01_Mara T03_Nancy 0.285 1.0 -13.6428 14.2128 False
T01_Mara T04_Michelle -3.1225 0.9952 -17.0503 10.8053 False
T01_Mara T05_Felicitas 13.15 0.0293 0.8669 25.4331 True
T01_Mara T06_Mara_Selena -1.345 1.0 -13.6281 10.9381 False
T01_Mara T07_Geraldinn -2.105 0.9996 -16.0328 11.8228 False
T01_Mara T08_Karina -5.495 0.9713 -23.4757 12.4857 False
T02_Laura T03_Nancy 2.845 0.9993 -14.526 20.216 False
T02_Laura T04_Michelle -0.5625 1.0 -17.9335 16.8085 False
T02_Laura T05_Felicitas 15.71 0.0593 -0.3724 31.7924 False
T02_Laura T06_Mara_Selena 1.215 1.0 -14.8674 17.2974 False
T02_Laura T07_Geraldinn 0.455 1.0 -16.916 17.826 False
T02_Laura T08_Karina -2.935 0.9997 -23.6973 17.8273 False
T03_Nancy T04_Michelle -3.4075 0.9966 -19.4899 12.6749 False
T03_Nancy T05_Felicitas 12.865 0.1193 -1.8161 27.5461 False
T03_Nancy T06_Mara_Selena -1.63 1.0 -16.3111 13.0511 False
T03_Nancy T07_Geraldinn -2.39 0.9997 -18.4724 13.6924 False
T03_Nancy T08_Karina -5.78 0.977 -25.4768 13.9168 False
T04_Michelle T05_Felicitas 16.2725 0.0218 1.5914 30.9536 True
T04_Michelle T06_Mara_Selena 1.7775 0.9999 -12.9036 16.4586 False
T04_Michelle T07_Geraldinn 1.0175 1.0 -15.0649 17.0999 False
T04_Michelle T08_Karina -2.3725 0.9999 -22.0693 17.3243 False
T05_Felicitas T06_Mara_Selena -14.495 0.0226 -27.6262 -1.3638 True
T05_Felicitas T07_Geraldinn -15.255 0.0373 -29.9361 -0.5739 True
T05_Felicitas T08_Karina -18.645 0.0485 -37.2153 -0.0747 True
T06_Mara_Selena T07_Geraldinn -0.76 1.0 -15.4411 13.9211 False
T06_Mara_Selena T08_Karina -4.15 0.9953 -22.7203 14.4203 False
T07_Geraldinn T08_Karina -3.39 0.9991 -23.0868 16.3068 False
-----------------------------------------------------------------------
- Tukey HSD results for PNNI_50 (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=====================================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------------
T01_Mara T02_Laura -0.1483 1.0 -4.063 3.7663 False
T01_Mara T03_Nancy 0.3775 1.0 -3.1634 3.9184 False
T01_Mara T04_Michelle -0.3125 1.0 -3.8534 3.2284 False
T01_Mara T05_Felicitas 3.225 0.0391 0.1022 6.3478 True
T01_Mara T06_Mara_Selena 0.2717 1.0 -2.8511 3.3945 False
T01_Mara T07_Geraldinn -0.2375 1.0 -3.7784 3.3034 False
T01_Mara T08_Karina -0.415 1.0 -4.9863 4.1563 False
T02_Laura T03_Nancy 0.5258 0.9999 -3.8905 4.9421 False
T02_Laura T04_Michelle -0.1642 1.0 -4.5805 4.2521 False
T02_Laura T05_Felicitas 3.3733 0.1658 -0.7154 7.462 False
T02_Laura T06_Mara_Selena 0.42 1.0 -3.6687 4.5087 False
T02_Laura T07_Geraldinn -0.0892 1.0 -4.5055 4.3271 False
T02_Laura T08_Karina -0.2667 1.0 -5.5451 5.0118 False
T03_Nancy T04_Michelle -0.69 0.9992 -4.7787 3.3987 False
T03_Nancy T05_Felicitas 2.8475 0.2399 -0.8849 6.5799 False
T03_Nancy T06_Mara_Selena -0.1058 1.0 -3.8383 3.6266 False
T03_Nancy T07_Geraldinn -0.615 0.9996 -4.7037 3.4737 False
T03_Nancy T08_Karina -0.7925 0.9995 -5.8001 4.2151 False
T04_Michelle T05_Felicitas 3.5375 0.073 -0.1949 7.2699 False
T04_Michelle T06_Mara_Selena 0.5842 0.9995 -3.1483 4.3166 False
T04_Michelle T07_Geraldinn 0.075 1.0 -4.0137 4.1637 False
T04_Michelle T08_Karina -0.1025 1.0 -5.1101 4.9051 False
T05_Felicitas T06_Mara_Selena -2.9533 0.1128 -6.2917 0.3851 False
T05_Felicitas T07_Geraldinn -3.4625 0.0841 -7.1949 0.2699 False
T05_Felicitas T08_Karina -3.64 0.2291 -8.3612 1.0812 False
T06_Mara_Selena T07_Geraldinn -0.5092 0.9998 -4.2416 3.2233 False
T06_Mara_Selena T08_Karina -0.6867 0.9997 -5.4079 4.0345 False
T07_Geraldinn T08_Karina -0.1775 1.0 -5.1851 4.8301 False
---------------------------------------------------------------------
- Tukey HSD results for PNNI_20 (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura 0.6396 1.0 -10.921 12.2002 False
T01_Mara T03_Nancy 3.3688 0.9619 -7.0882 13.8257 False
T01_Mara T04_Michelle 0.2713 1.0 -10.1857 10.7282 False
T01_Mara T05_Felicitas 9.8429 0.03 0.6207 19.0651 True
T01_Mara T06_Mara_Selena 1.5379 0.9993 -7.6843 10.7601 False
T01_Mara T07_Geraldinn 1.3563 0.9999 -9.1007 11.8132 False
T01_Mara T08_Karina -1.6837 0.9999 -15.1836 11.8161 False
T02_Laura T03_Nancy 2.7292 0.9969 -10.313 15.7713 False
T02_Laura T04_Michelle -0.3683 1.0 -13.4105 12.6738 False
T02_Laura T05_Felicitas 9.2033 0.2408 -2.8713 21.278 False
T02_Laura T06_Mara_Selena 0.8983 1.0 -11.1763 12.973 False
T02_Laura T07_Geraldinn 0.7167 1.0 -12.3255 13.7588 False
T02_Laura T08_Karina -2.3233 0.9996 -17.9117 13.265 False
T03_Nancy T04_Michelle -3.0975 0.9893 -15.1722 8.9772 False
T03_Nancy T05_Felicitas 6.4742 0.5518 -4.5484 17.4968 False
T03_Nancy T06_Mara_Selena -1.8308 0.9993 -12.8534 9.1918 False
T03_Nancy T07_Geraldinn -2.0125 0.9993 -14.0872 10.0622 False
T03_Nancy T08_Karina -5.0525 0.9483 -19.8409 9.7359 False
T04_Michelle T05_Felicitas 9.5717 0.1257 -1.4509 20.5943 False
T04_Michelle T06_Mara_Selena 1.2667 0.9999 -9.7559 12.2893 False
T04_Michelle T07_Geraldinn 1.085 1.0 -10.9897 13.1597 False
T04_Michelle T08_Karina -1.955 0.9998 -16.7434 12.8334 False
T05_Felicitas T06_Mara_Selena -8.305 0.1487 -18.1639 1.5539 False
T05_Felicitas T07_Geraldinn -8.4867 0.2305 -19.5093 2.5359 False
T05_Felicitas T08_Karina -11.5267 0.164 -25.4693 2.416 False
T06_Mara_Selena T07_Geraldinn -0.1817 1.0 -11.2043 10.8409 False
T06_Mara_Selena T08_Karina -3.2217 0.9943 -17.1643 10.721 False
T07_Geraldinn T08_Karina -3.04 0.9972 -17.8284 11.7484 False
-----------------------------------------------------------------------
- Tukey HSD results for HFNU (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -2.7446 0.8683 -9.3362 3.847 False
T01_Mara T03_Nancy -4.5238 0.2456 -10.4861 1.4386 False
T01_Mara T04_Michelle -6.6712 0.02 -12.6336 -0.7089 True
T01_Mara T05_Felicitas -2.1363 0.8818 -7.3945 3.122 False
T01_Mara T06_Mara_Selena -6.3529 0.0097 -11.6112 -1.0947 True
T01_Mara T07_Geraldinn -4.8713 0.1742 -10.8336 1.0911 False
T01_Mara T08_Karina -5.7813 0.2563 -13.4786 1.9161 False
T02_Laura T03_Nancy -1.7792 0.9929 -9.2155 5.6571 False
T02_Laura T04_Michelle -3.9267 0.6735 -11.363 3.5096 False
T02_Laura T05_Felicitas 0.6083 1.0 -6.2764 7.493 False
T02_Laura T06_Mara_Selena -3.6083 0.6814 -10.493 3.2764 False
T02_Laura T07_Geraldinn -2.1267 0.9801 -9.563 5.3096 False
T02_Laura T08_Karina -3.0367 0.9483 -11.9248 5.8514 False
T03_Nancy T04_Michelle -2.1475 0.9679 -9.0322 4.7372 False
T03_Nancy T05_Felicitas 2.3875 0.9131 -3.8973 8.6723 False
T03_Nancy T06_Mara_Selena -1.8292 0.978 -8.114 4.4557 False
T03_Nancy T07_Geraldinn -0.3475 1.0 -7.2322 6.5372 False
T03_Nancy T08_Karina -1.2575 0.9996 -9.6895 7.1745 False
T04_Michelle T05_Felicitas 4.535 0.3007 -1.7498 10.8198 False
T04_Michelle T06_Mara_Selena 0.3183 1.0 -5.9665 6.6032 False
T04_Michelle T07_Geraldinn 1.8 0.988 -5.0847 8.6847 False
T04_Michelle T08_Karina 0.89 1.0 -7.542 9.322 False
T05_Felicitas T06_Mara_Selena -4.2167 0.2577 -9.838 1.4047 False
T05_Felicitas T07_Geraldinn -2.735 0.841 -9.0198 3.5498 False
T05_Felicitas T08_Karina -3.645 0.8036 -11.5947 4.3047 False
T06_Mara_Selena T07_Geraldinn 1.4817 0.9935 -4.8032 7.7665 False
T06_Mara_Selena T08_Karina 0.5717 1.0 -7.3781 8.5214 False
T07_Geraldinn T08_Karina -0.91 1.0 -9.342 7.522 False
-----------------------------------------------------------------------
- Tukey HSD results for SD1 (since ANOVA was significant):
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -1.8121 0.9993 -12.7001 9.0759 False
T01_Mara T03_Nancy 0.2012 1.0 -9.6473 10.0498 False
T01_Mara T04_Michelle -2.2062 0.9952 -12.0548 7.6423 False
T01_Mara T05_Felicitas 9.2979 0.0293 0.6123 17.9835 True
T01_Mara T06_Mara_Selena -0.9521 1.0 -9.6377 7.7335 False
T01_Mara T07_Geraldinn -1.4888 0.9996 -11.3373 8.3598 False
T01_Mara T08_Karina -3.8837 0.9713 -16.5982 8.8307 False
T02_Laura T03_Nancy 2.0133 0.9993 -10.27 14.2966 False
T02_Laura T04_Michelle -0.3942 1.0 -12.6775 11.8891 False
T02_Laura T05_Felicitas 11.11 0.0592 -0.2621 22.4821 False
T02_Laura T06_Mara_Selena 0.86 1.0 -10.5121 12.2321 False
T02_Laura T07_Geraldinn 0.3233 1.0 -11.96 12.6066 False
T02_Laura T08_Karina -2.0717 0.9998 -16.753 12.6097 False
T03_Nancy T04_Michelle -2.4075 0.9966 -13.7796 8.9646 False
T03_Nancy T05_Felicitas 9.0967 0.1193 -1.2846 19.4779 False
T03_Nancy T06_Mara_Selena -1.1533 1.0 -11.5346 9.2279 False
T03_Nancy T07_Geraldinn -1.69 0.9997 -13.0621 9.6821 False
T03_Nancy T08_Karina -4.085 0.9771 -18.0129 9.8429 False
T04_Michelle T05_Felicitas 11.5042 0.0218 1.1229 21.8854 True
T04_Michelle T06_Mara_Selena 1.2542 0.9999 -9.1271 11.6354 False
T04_Michelle T07_Geraldinn 0.7175 1.0 -10.6546 12.0896 False
T04_Michelle T08_Karina -1.6775 0.9999 -15.6054 12.2504 False
T05_Felicitas T06_Mara_Selena -10.25 0.0226 -19.5353 -0.9647 True
T05_Felicitas T07_Geraldinn -10.7867 0.0373 -21.1679 -0.4054 True
T05_Felicitas T08_Karina -13.1817 0.0486 -26.3131 -0.0503 True
T06_Mara_Selena T07_Geraldinn -0.5367 1.0 -10.9179 9.8446 False
T06_Mara_Selena T08_Karina -2.9317 0.9953 -16.0631 10.1997 False
T07_Geraldinn T08_Karina -2.395 0.9991 -16.3229 11.5329 False
-----------------------------------------------------------------------
✓ Statistical analysis complete.
--- Plotting Longitudinal Trends (Linear Fit) ---
--- Plotting Mission Phase Analysis ---
--- Running Generalized Additive Models (GAMs) ---
This analysis captures non-linear trends over the mission duration.
--- Analyzing Metric: RMSSD ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -178.2912
Number of Samples: 37 AIC: 379.8952
AICc: 392.0156
GCV: 98.0213
Scale: 49.0325
Pseudo R-Squared: 0.4969
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 7.65e-01
f(1) [0.6] 8 5.3 8.65e-03 **
intercept 1 0.0 2.03e-06 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: PNNI_50 ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -80.8561
Number of Samples: 37 AIC: 185.025
AICc: 197.1454
GCV: 6.3395
Scale: 3.1712
Pseudo R-Squared: 0.4271
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 8.94e-01
f(1) [0.6] 8 5.3 4.11e-02 *
intercept 1 0.0 1.70e-01
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: PNNI_20 ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -156.9732
Number of Samples: 37 AIC: 337.2592
AICc: 349.3795
GCV: 54.7781
Scale: 27.4013
Pseudo R-Squared: 0.4209
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 9.49e-01
f(1) [0.6] 8 5.3 4.71e-02 *
intercept 1 0.0 2.86e-03 **
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: HF ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -352.976
Number of Samples: 37 AIC: 729.2647
AICc: 741.3851
GCV: 11087.6823
Scale: 5546.3138
Pseudo R-Squared: 0.4016
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 8.62e-01
f(1) [0.6] 8 5.3 5.52e-02 .
intercept 1 0.0 1.63e-01
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: HFNU ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -110.8589
Number of Samples: 37 AIC: 245.0305
AICc: 257.1509
GCV: 15.2293
Scale: 7.618
Pseudo R-Squared: 0.5816
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 8.23e-02 .
f(1) [0.6] 8 5.3 4.28e-03 **
intercept 1 0.0 1.88e-09 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
--- Analyzing Metric: SD1 ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -152.9113
Number of Samples: 37 AIC: 329.1353
AICc: 341.2557
GCV: 49.0078
Scale: 24.5148
Pseudo R-Squared: 0.4969
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 7.65e-01
f(1) [0.6] 8 5.3 8.65e-03 **
intercept 1 0.0 2.03e-06 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
None
================================================================================
COMPREHENSIVE PARASYMPATHETIC NERVOUS SYSTEM REPORT
================================================================================
### 1. INTRODUCTION ###
This report details the analysis of parasympathetic nervous system (PNS) activity in the space analog crew.
PNS activity, often termed the 'rest-and-digest' system, is a critical indicator of stress, recovery, and autonomic health.
The analysis utilizes established Heart Rate Variability (HRV) metrics to quantify PNS tone.
### 2. METHODOLOGY ###
The following validated HRV metrics were used to assess parasympathetic activity:
- RMSSD & SD1: Reflect short-term, beat-to-beat variability (vagal tone).
- pNN50/pNN20: Percentage of successive beats that differ by more than 50ms/20ms.
- HF/HFnu: High-frequency power, a direct marker of vagal modulation of the heart.
Statistical analyses included:
1. One-Way ANOVA and Tukey's HSD for between-crew comparisons.
2. Spearman/Pearson correlation to assess linear trends over time.
3. Generalized Additive Models (GAMs) to identify non-linear trends over time.
### 3. STATISTICAL RESULTS ###
--- Between-Crew Differences (ANOVA) ---
This test checks if there are significant differences in the average metric values among crew members over the entire mission.
- RMSSD: F=3.40, p=0.009 (p < 0.05, SIGNIFICANT)
- PNNI_50: F=2.49, p=0.040 (p < 0.05, SIGNIFICANT)
- PNNI_20: F=2.35, p=0.049 (p < 0.05, SIGNIFICANT)
- HF: F=2.32, p=0.052 (p > 0.05, not significant)
- HFNU: F=3.55, p=0.007 (p < 0.05, SIGNIFICANT)
- SD1: F=3.40, p=0.009 (p < 0.05, SIGNIFICANT)
--- Mission Time Trend Analysis (Correlation) ---
This test checks if metrics generally increased or decreased over the course of the mission for the crew as a whole.
- RMSSD: Spearman r=-0.279 (p=0.094, p > 0.05, not significant)
- PNNI_50: Spearman r=-0.308 (p=0.064, p > 0.05, not significant)
- PNNI_20: Spearman r=-0.241 (p=0.150, p > 0.05, not significant)
- HF: Spearman r=-0.214 (p=0.203, p > 0.05, not significant)
- HFNU: Spearman r=-0.318 (p=0.055, p > 0.05, not significant)
- SD1: Spearman r=-0.278 (p=0.096, p > 0.05, not significant)
--- Non-Linear Trend Analysis (GAM) ---
GAMs were used to explore complex, non-linear patterns in HRV metrics over the mission duration.
The plots generated by this analysis visualize the average trend for the crew, revealing periods of increase, decrease, or stability that a simple linear model might miss.
Interpretation should be based on visual inspection of the GAM plots: a significant curve or 'wiggle' suggests a non-linear relationship with time.
--- Pairwise Crew Comparisons (Tukey's HSD) ---
For metrics where ANOVA was significant, this test identifies which specific crew members differed from each other.
Significant differences for RMSSD:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -2.56 0.9993 -17.9577 12.8377 False
T01_Mara T03_Nancy 0.285 1.0 -13.6428 14.2128 False
T01_Mara T04_Michelle -3.1225 0.9952 -17.0503 10.8053 False
T01_Mara T05_Felicitas 13.15 0.0293 0.8669 25.4331 True
T01_Mara T06_Mara_Selena -1.345 1.0 -13.6281 10.9381 False
T01_Mara T07_Geraldinn -2.105 0.9996 -16.0328 11.8228 False
T01_Mara T08_Karina -5.495 0.9713 -23.4757 12.4857 False
T02_Laura T03_Nancy 2.845 0.9993 -14.526 20.216 False
T02_Laura T04_Michelle -0.5625 1.0 -17.9335 16.8085 False
T02_Laura T05_Felicitas 15.71 0.0593 -0.3724 31.7924 False
T02_Laura T06_Mara_Selena 1.215 1.0 -14.8674 17.2974 False
T02_Laura T07_Geraldinn 0.455 1.0 -16.916 17.826 False
T02_Laura T08_Karina -2.935 0.9997 -23.6973 17.8273 False
T03_Nancy T04_Michelle -3.4075 0.9966 -19.4899 12.6749 False
T03_Nancy T05_Felicitas 12.865 0.1193 -1.8161 27.5461 False
T03_Nancy T06_Mara_Selena -1.63 1.0 -16.3111 13.0511 False
T03_Nancy T07_Geraldinn -2.39 0.9997 -18.4724 13.6924 False
T03_Nancy T08_Karina -5.78 0.977 -25.4768 13.9168 False
T04_Michelle T05_Felicitas 16.2725 0.0218 1.5914 30.9536 True
T04_Michelle T06_Mara_Selena 1.7775 0.9999 -12.9036 16.4586 False
T04_Michelle T07_Geraldinn 1.0175 1.0 -15.0649 17.0999 False
T04_Michelle T08_Karina -2.3725 0.9999 -22.0693 17.3243 False
T05_Felicitas T06_Mara_Selena -14.495 0.0226 -27.6262 -1.3638 True
T05_Felicitas T07_Geraldinn -15.255 0.0373 -29.9361 -0.5739 True
T05_Felicitas T08_Karina -18.645 0.0485 -37.2153 -0.0747 True
T06_Mara_Selena T07_Geraldinn -0.76 1.0 -15.4411 13.9211 False
T06_Mara_Selena T08_Karina -4.15 0.9953 -22.7203 14.4203 False
T07_Geraldinn T08_Karina -3.39 0.9991 -23.0868 16.3068 False
-----------------------------------------------------------------------
Significant differences for PNNI_50:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=====================================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------------------------
T01_Mara T02_Laura -0.1483 1.0 -4.063 3.7663 False
T01_Mara T03_Nancy 0.3775 1.0 -3.1634 3.9184 False
T01_Mara T04_Michelle -0.3125 1.0 -3.8534 3.2284 False
T01_Mara T05_Felicitas 3.225 0.0391 0.1022 6.3478 True
T01_Mara T06_Mara_Selena 0.2717 1.0 -2.8511 3.3945 False
T01_Mara T07_Geraldinn -0.2375 1.0 -3.7784 3.3034 False
T01_Mara T08_Karina -0.415 1.0 -4.9863 4.1563 False
T02_Laura T03_Nancy 0.5258 0.9999 -3.8905 4.9421 False
T02_Laura T04_Michelle -0.1642 1.0 -4.5805 4.2521 False
T02_Laura T05_Felicitas 3.3733 0.1658 -0.7154 7.462 False
T02_Laura T06_Mara_Selena 0.42 1.0 -3.6687 4.5087 False
T02_Laura T07_Geraldinn -0.0892 1.0 -4.5055 4.3271 False
T02_Laura T08_Karina -0.2667 1.0 -5.5451 5.0118 False
T03_Nancy T04_Michelle -0.69 0.9992 -4.7787 3.3987 False
T03_Nancy T05_Felicitas 2.8475 0.2399 -0.8849 6.5799 False
T03_Nancy T06_Mara_Selena -0.1058 1.0 -3.8383 3.6266 False
T03_Nancy T07_Geraldinn -0.615 0.9996 -4.7037 3.4737 False
T03_Nancy T08_Karina -0.7925 0.9995 -5.8001 4.2151 False
T04_Michelle T05_Felicitas 3.5375 0.073 -0.1949 7.2699 False
T04_Michelle T06_Mara_Selena 0.5842 0.9995 -3.1483 4.3166 False
T04_Michelle T07_Geraldinn 0.075 1.0 -4.0137 4.1637 False
T04_Michelle T08_Karina -0.1025 1.0 -5.1101 4.9051 False
T05_Felicitas T06_Mara_Selena -2.9533 0.1128 -6.2917 0.3851 False
T05_Felicitas T07_Geraldinn -3.4625 0.0841 -7.1949 0.2699 False
T05_Felicitas T08_Karina -3.64 0.2291 -8.3612 1.0812 False
T06_Mara_Selena T07_Geraldinn -0.5092 0.9998 -4.2416 3.2233 False
T06_Mara_Selena T08_Karina -0.6867 0.9997 -5.4079 4.0345 False
T07_Geraldinn T08_Karina -0.1775 1.0 -5.1851 4.8301 False
---------------------------------------------------------------------
Significant differences for PNNI_20:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura 0.6396 1.0 -10.921 12.2002 False
T01_Mara T03_Nancy 3.3688 0.9619 -7.0882 13.8257 False
T01_Mara T04_Michelle 0.2713 1.0 -10.1857 10.7282 False
T01_Mara T05_Felicitas 9.8429 0.03 0.6207 19.0651 True
T01_Mara T06_Mara_Selena 1.5379 0.9993 -7.6843 10.7601 False
T01_Mara T07_Geraldinn 1.3563 0.9999 -9.1007 11.8132 False
T01_Mara T08_Karina -1.6837 0.9999 -15.1836 11.8161 False
T02_Laura T03_Nancy 2.7292 0.9969 -10.313 15.7713 False
T02_Laura T04_Michelle -0.3683 1.0 -13.4105 12.6738 False
T02_Laura T05_Felicitas 9.2033 0.2408 -2.8713 21.278 False
T02_Laura T06_Mara_Selena 0.8983 1.0 -11.1763 12.973 False
T02_Laura T07_Geraldinn 0.7167 1.0 -12.3255 13.7588 False
T02_Laura T08_Karina -2.3233 0.9996 -17.9117 13.265 False
T03_Nancy T04_Michelle -3.0975 0.9893 -15.1722 8.9772 False
T03_Nancy T05_Felicitas 6.4742 0.5518 -4.5484 17.4968 False
T03_Nancy T06_Mara_Selena -1.8308 0.9993 -12.8534 9.1918 False
T03_Nancy T07_Geraldinn -2.0125 0.9993 -14.0872 10.0622 False
T03_Nancy T08_Karina -5.0525 0.9483 -19.8409 9.7359 False
T04_Michelle T05_Felicitas 9.5717 0.1257 -1.4509 20.5943 False
T04_Michelle T06_Mara_Selena 1.2667 0.9999 -9.7559 12.2893 False
T04_Michelle T07_Geraldinn 1.085 1.0 -10.9897 13.1597 False
T04_Michelle T08_Karina -1.955 0.9998 -16.7434 12.8334 False
T05_Felicitas T06_Mara_Selena -8.305 0.1487 -18.1639 1.5539 False
T05_Felicitas T07_Geraldinn -8.4867 0.2305 -19.5093 2.5359 False
T05_Felicitas T08_Karina -11.5267 0.164 -25.4693 2.416 False
T06_Mara_Selena T07_Geraldinn -0.1817 1.0 -11.2043 10.8409 False
T06_Mara_Selena T08_Karina -3.2217 0.9943 -17.1643 10.721 False
T07_Geraldinn T08_Karina -3.04 0.9972 -17.8284 11.7484 False
-----------------------------------------------------------------------
Significant differences for HFNU:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -2.7446 0.8683 -9.3362 3.847 False
T01_Mara T03_Nancy -4.5238 0.2456 -10.4861 1.4386 False
T01_Mara T04_Michelle -6.6712 0.02 -12.6336 -0.7089 True
T01_Mara T05_Felicitas -2.1363 0.8818 -7.3945 3.122 False
T01_Mara T06_Mara_Selena -6.3529 0.0097 -11.6112 -1.0947 True
T01_Mara T07_Geraldinn -4.8713 0.1742 -10.8336 1.0911 False
T01_Mara T08_Karina -5.7813 0.2563 -13.4786 1.9161 False
T02_Laura T03_Nancy -1.7792 0.9929 -9.2155 5.6571 False
T02_Laura T04_Michelle -3.9267 0.6735 -11.363 3.5096 False
T02_Laura T05_Felicitas 0.6083 1.0 -6.2764 7.493 False
T02_Laura T06_Mara_Selena -3.6083 0.6814 -10.493 3.2764 False
T02_Laura T07_Geraldinn -2.1267 0.9801 -9.563 5.3096 False
T02_Laura T08_Karina -3.0367 0.9483 -11.9248 5.8514 False
T03_Nancy T04_Michelle -2.1475 0.9679 -9.0322 4.7372 False
T03_Nancy T05_Felicitas 2.3875 0.9131 -3.8973 8.6723 False
T03_Nancy T06_Mara_Selena -1.8292 0.978 -8.114 4.4557 False
T03_Nancy T07_Geraldinn -0.3475 1.0 -7.2322 6.5372 False
T03_Nancy T08_Karina -1.2575 0.9996 -9.6895 7.1745 False
T04_Michelle T05_Felicitas 4.535 0.3007 -1.7498 10.8198 False
T04_Michelle T06_Mara_Selena 0.3183 1.0 -5.9665 6.6032 False
T04_Michelle T07_Geraldinn 1.8 0.988 -5.0847 8.6847 False
T04_Michelle T08_Karina 0.89 1.0 -7.542 9.322 False
T05_Felicitas T06_Mara_Selena -4.2167 0.2577 -9.838 1.4047 False
T05_Felicitas T07_Geraldinn -2.735 0.841 -9.0198 3.5498 False
T05_Felicitas T08_Karina -3.645 0.8036 -11.5947 4.3047 False
T06_Mara_Selena T07_Geraldinn 1.4817 0.9935 -4.8032 7.7665 False
T06_Mara_Selena T08_Karina 0.5717 1.0 -7.3781 8.5214 False
T07_Geraldinn T08_Karina -0.91 1.0 -9.342 7.522 False
-----------------------------------------------------------------------
Significant differences for SD1:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=======================================================================
group1 group2 meandiff p-adj lower upper reject
-----------------------------------------------------------------------
T01_Mara T02_Laura -1.8121 0.9993 -12.7001 9.0759 False
T01_Mara T03_Nancy 0.2012 1.0 -9.6473 10.0498 False
T01_Mara T04_Michelle -2.2062 0.9952 -12.0548 7.6423 False
T01_Mara T05_Felicitas 9.2979 0.0293 0.6123 17.9835 True
T01_Mara T06_Mara_Selena -0.9521 1.0 -9.6377 7.7335 False
T01_Mara T07_Geraldinn -1.4888 0.9996 -11.3373 8.3598 False
T01_Mara T08_Karina -3.8837 0.9713 -16.5982 8.8307 False
T02_Laura T03_Nancy 2.0133 0.9993 -10.27 14.2966 False
T02_Laura T04_Michelle -0.3942 1.0 -12.6775 11.8891 False
T02_Laura T05_Felicitas 11.11 0.0592 -0.2621 22.4821 False
T02_Laura T06_Mara_Selena 0.86 1.0 -10.5121 12.2321 False
T02_Laura T07_Geraldinn 0.3233 1.0 -11.96 12.6066 False
T02_Laura T08_Karina -2.0717 0.9998 -16.753 12.6097 False
T03_Nancy T04_Michelle -2.4075 0.9966 -13.7796 8.9646 False
T03_Nancy T05_Felicitas 9.0967 0.1193 -1.2846 19.4779 False
T03_Nancy T06_Mara_Selena -1.1533 1.0 -11.5346 9.2279 False
T03_Nancy T07_Geraldinn -1.69 0.9997 -13.0621 9.6821 False
T03_Nancy T08_Karina -4.085 0.9771 -18.0129 9.8429 False
T04_Michelle T05_Felicitas 11.5042 0.0218 1.1229 21.8854 True
T04_Michelle T06_Mara_Selena 1.2542 0.9999 -9.1271 11.6354 False
T04_Michelle T07_Geraldinn 0.7175 1.0 -10.6546 12.0896 False
T04_Michelle T08_Karina -1.6775 0.9999 -15.6054 12.2504 False
T05_Felicitas T06_Mara_Selena -10.25 0.0226 -19.5353 -0.9647 True
T05_Felicitas T07_Geraldinn -10.7867 0.0373 -21.1679 -0.4054 True
T05_Felicitas T08_Karina -13.1817 0.0486 -26.3131 -0.0503 True
T06_Mara_Selena T07_Geraldinn -0.5367 1.0 -10.9179 9.8446 False
T06_Mara_Selena T08_Karina -2.9317 0.9953 -16.0631 10.1997 False
T07_Geraldinn T08_Karina -2.395 0.9991 -16.3229 11.5329 False
-----------------------------------------------------------------------
================================================================================
END OF REPORT
================================================================================
# =================================================================================
# Cell: Next-Step Recommendations - Analysis and Interpretation
# =================================================================================
# This notebook cell provides a complete workflow to address the key next-step
# recommendations from the initial analysis.
#
# It is structured to directly investigate:
# 1. Recommendation 1: Visualize Trajectories
# - Implemented by running Generalized Additive Models (GAMs) to plot
# non-linear individual and group trends over time.
#
# 2. Recommendation 2: Model Multivariate Links
# - Implemented by first plotting a correlation heatmap to identify candidate
# links, followed by a demonstrative multivariate mixed-effects model.
#
# 3. Recommendation 3: Validate Externally
# - Addressed in a detailed markdown section at the end, explaining the
# concept and identifying the key finding to validate in future studies.
# =================================================================================
# --- 1. Imports and Setup ---
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pygam import GAM, s, f
import statsmodels.formula.api as smf
import warnings
warnings.filterwarnings('ignore')
# --- 2. Plotting Style Configuration ---
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl", 8)
plt.rcParams['figure.figsize'] = (16, 10)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.titlesize'] = 16
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 11
plt.rcParams['figure.dpi'] = 100
# --- 3. The AdvancedAnalysis Class ---
class AdvancedAnalysis:
"""
A class to encapsulate the advanced analysis workflows based on the
next-step recommendations.
"""
def __init__(self, data_path=(r"C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv")):
self.data_path = data_path
self.data = None
self.parasympathetic_metrics = ['rmssd', 'pnni_50', 'pnni_20', 'hf', 'hfnu', 'sd1']
self.crew_names = {
'T01_Mara': 'Mara', 'T02_Laura': 'Laura', 'T03_Nancy': 'Nancy',
'T04_Michelle': 'Michelle', 'T05_Felicitas': 'Felicitas',
'T06_Mara_Selena': 'Mara Selena', 'T07_Geraldinn': 'Geraldinn', 'T08_Karina': 'Karina'
}
self.load_data()
def load_data(self):
"""Loads and prepares the data."""
print("--- Loading and Preparing Data ---")
try:
self.data = pd.read_csv(self.data_path)
self.data['Subject'] = self.data['Subject'].astype(str)
self.data['Sol'] = self.data['Sol'].astype(int)
self.data['Crew_Name'] = self.data['Subject'].map(self.crew_names)
self.data['Subject_code'] = self.data['Subject'].astype('category').cat.codes
print(f"✓ Data loaded successfully from {self.data_path}")
except Exception as e:
print(f"✗ An error occurred during data loading: {e}")
def visualize_trajectories(self):
"""
Recommendation 1: Visualize Trajectories using GAMs.
This method plots non-linear trends for each metric, which clearly
shows individual trajectories and the overall group trend.
"""
print("\n\n" + "="*80)
print("### Recommendation 1: Visualize Trajectories ###")
print("Using Generalized Additive Models (GAMs) to visualize non-linear trends.")
print("="*80)
for metric in self.parasympathetic_metrics:
if metric not in self.data.columns: continue
print(f"\n--- GAM Analysis for: {metric.upper()} ---")
try:
model_data = self.data[['Sol', 'Subject_code', 'Crew_Name', metric]].dropna()
if len(model_data) < 15:
print("Not enough data to fit model.")
continue
gam = GAM(s(0, n_splines=10) + f(1)).fit(model_data[['Sol', 'Subject_code']], model_data[metric])
gam.summary()
fig, ax = plt.subplots(1, 1, figsize=(12, 8))
XX = gam.generate_X_grid(term=0)
pdep, confi = gam.partial_dependence(term=0, X=XX, width=0.95)
ax.plot(XX[:, 0], pdep, color='royalblue', linewidth=3, label='GAM Group Trend')
ax.fill_between(XX[:, 0], confi[:, 0], confi[:, 1], color='cornflowerblue', alpha=0.3, label='95% Confidence Interval')
sns.scatterplot(x='Sol', y=metric, hue='Crew_Name', data=model_data, ax=ax, alpha=0.7, s=60)
ax.set_title(f"Individual Trajectories and Group Trend for {metric.upper()}", fontsize=16, fontweight='bold')
ax.set_xlabel("Sol (Mission Day)")
ax.set_ylabel(f"{metric.upper()} Value")
ax.legend(title='Subject', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()
except Exception as e:
print(f"Could not fit GAM for {metric}: {e}")
def model_multivariate_links(self):
"""
Recommendation 2: Model Multivariate Links.
Step 1: Use a heatmap to find strongly correlated metrics.
Step 2: Run a demonstrative multivariate mixed-effects model.
"""
print("\n\n" + "="*80)
print("### Recommendation 2: Model Multivariate Links ###")
print("="*80)
# --- Step 2a: Identify potential links with a correlation heatmap ---
print("\n--- Step 2a: Identifying links with a Correlation Heatmap ---")
corr_matrix = self.data[self.parasympathetic_metrics].corr(method='spearman')
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='viridis', fmt='.2f', linewidths=.5)
plt.title('Spearman Correlation Heatmap of Parasympathetic Metrics', fontsize=16, fontweight='bold')
plt.show()
# --- Step 2b: Demonstrate a multivariate mixed-effects model ---
print("\n--- Step 2b: Demonstrating a Multivariate Mixed-Effects Model ---")
print("Based on the heatmap, RMSSD and SD1 are almost perfectly correlated (r > 0.99).")
print("Let's build a model to see how SD1 predicts RMSSD, while controlling for time (Sol) and individual subjects.")
try:
# Formula: Predict RMSSD based on Sol and SD1, with a random intercept for each Subject
model_formula = "rmssd ~ Sol + sd1"
model = smf.mixedlm(model_formula, self.data, groups=self.data["Subject"])
result = model.fit()
print("\n--- Multivariate Model Summary (RMSSD ~ Sol + SD1) ---")
print(result.summary())
print("\n--- Interpretation ---")
sd1_coef = result.params['sd1']
sd1_p_value = result.pvalues['sd1']
print(f"The coefficient for sd1 is {sd1_coef:.4f} with a p-value of {sd1_p_value:.4E}.")
if sd1_p_value < 0.05:
print("This is highly significant, confirming that SD1 is a very strong predictor of RMSSD, even when accounting for mission day and individual differences.")
print("This result is expected, as they are mathematically related metrics, but it successfully demonstrates the multivariate modeling approach.")
else:
print("The relationship between SD1 and RMSSD was not significant in this model, which would be an unexpected finding.")
except Exception as e:
print(f"Could not fit multivariate model: {e}")
def explain_external_validation(self):
"""
Recommendation 3: Explain External Validation.
This method prints a detailed explanation of this conceptual step.
"""
print("\n\n" + "="*80)
print("### Recommendation 3: External Validation (A Conceptual Framework) ###")
print("="*80)
print("""
This recommendation is conceptual and does not involve running new code on the current dataset. Instead, it outlines a critical next step for ensuring the research is robust and generalizable.
**What is External Validation?**
External validation is the process of testing your findings on a completely new, independent dataset. In this context, it would mean repeating the key parts of this analysis on data collected from a different analog mission with a different set of crew members.
**Why is it Important?**
It confirms that your findings are not just a coincidence or an idiosyncrasy of the specific 8 individuals in this study. If the same patterns emerge in a new cohort, it provides strong evidence that you have uncovered a genuine physiological phenomenon related to the analog mission environment.
**What is the KEY FINDING to Validate from This Study?**
Based on the ANOVA, Tukey HSD, and GAM results, the most powerful and statistically significant finding from your current dataset is the **high degree of inter-individual variability**.
- The ANOVA tests were significant for most metrics (RMSSD, pNN50, pNN20, hfnu, SD1).
- The Tukey HSD tests pinpointed which specific individuals were different from others (e.g., T05_Felicitas having significantly higher RMSSD than 5 other crew members).
- The GAM analysis also showed that the 'Subject' factor was highly significant.
**How to Validate This Finding:**
1. Collect HRV data from a new cohort of analog astronauts.
2. Run the same One-Way ANOVA analysis on the new data.
3. **Validation Check:** If the ANOVA results are again significant (p < 0.05), you have externally validated the finding that individual differences are a major driver of parasympathetic HRV in this environment. This strengthens the conclusion that "one size does not fit all" and that personalized physiological monitoring is crucial.
""")
# --- 5. Main Execution Block ---
# Create an instance of the analyzer and run all the recommended analyses.
analyzer = AdvancedAnalysis()
if analyzer.data is not None:
# Run and display the analysis for each recommendation
analyzer.visualize_trajectories()
analyzer.model_multivariate_links()
analyzer.explain_external_validation()
else:
print("\nAnalysis could not proceed because data failed to load.")
--- Loading and Preparing Data ---
✓ Data loaded successfully from C:\Users\User\OneDrive\FAC\Research\Valquiria\Data\working_folder\hrv_results\hrv_complete.csv
================================================================================
### Recommendation 1: Visualize Trajectories ###
Using Generalized Additive Models (GAMs) to visualize non-linear trends.
================================================================================
--- GAM Analysis for: RMSSD ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -178.2912
Number of Samples: 37 AIC: 379.8952
AICc: 392.0156
GCV: 98.0213
Scale: 49.0325
Pseudo R-Squared: 0.4969
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 7.65e-01
f(1) [0.6] 8 5.3 8.65e-03 **
intercept 1 0.0 2.03e-06 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
--- GAM Analysis for: PNNI_50 ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -80.8561
Number of Samples: 37 AIC: 185.025
AICc: 197.1454
GCV: 6.3395
Scale: 3.1712
Pseudo R-Squared: 0.4271
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 8.94e-01
f(1) [0.6] 8 5.3 4.11e-02 *
intercept 1 0.0 1.70e-01
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
--- GAM Analysis for: PNNI_20 ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -156.9732
Number of Samples: 37 AIC: 337.2592
AICc: 349.3795
GCV: 54.7781
Scale: 27.4013
Pseudo R-Squared: 0.4209
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 9.49e-01
f(1) [0.6] 8 5.3 4.71e-02 *
intercept 1 0.0 2.86e-03 **
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
--- GAM Analysis for: HF ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -352.976
Number of Samples: 37 AIC: 729.2647
AICc: 741.3851
GCV: 11087.6823
Scale: 5546.3138
Pseudo R-Squared: 0.4016
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 8.62e-01
f(1) [0.6] 8 5.3 5.52e-02 .
intercept 1 0.0 1.63e-01
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
--- GAM Analysis for: HFNU ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -110.8589
Number of Samples: 37 AIC: 245.0305
AICc: 257.1509
GCV: 15.2293
Scale: 7.618
Pseudo R-Squared: 0.5816
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 8.23e-02 .
f(1) [0.6] 8 5.3 4.28e-03 **
intercept 1 0.0 1.88e-09 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
--- GAM Analysis for: SD1 ---
GAM
=============================================== ==========================================================
Distribution: NormalDist Effective DoF: 10.6564
Link Function: IdentityLink Log Likelihood: -152.9113
Number of Samples: 37 AIC: 329.1353
AICc: 341.2557
GCV: 49.0078
Scale: 24.5148
Pseudo R-Squared: 0.4969
==========================================================================================================
Feature Function Lambda Rank EDoF P > x Sig. Code
================================= ==================== ============ ============ ============ ============
s(0) [0.6] 10 5.4 7.65e-01
f(1) [0.6] 8 5.3 8.65e-03 **
intercept 1 0.0 2.03e-06 ***
==========================================================================================================
Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problem
which can cause p-values to appear significant when they are not.
WARNING: p-values calculated in this manner behave correctly for un-penalized models or models with
known smoothing parameters, but when smoothing parameters have been estimated, the p-values
are typically lower than they should be, meaning that the tests reject the null too readily.
================================================================================ ### Recommendation 2: Model Multivariate Links ### ================================================================================ --- Step 2a: Identifying links with a Correlation Heatmap ---
--- Step 2b: Demonstrating a Multivariate Mixed-Effects Model ---
Based on the heatmap, RMSSD and SD1 are almost perfectly correlated (r > 0.99).
Let's build a model to see how SD1 predicts RMSSD, while controlling for time (Sol) and individual subjects.
--- Multivariate Model Summary (RMSSD ~ Sol + SD1) ---
Mixed Linear Model Regression Results
======================================================
Model: MixedLM Dependent Variable: rmssd
No. Observations: 37 Method: REML
No. Groups: 8 Scale: 0.0000
Min. group size: 2 Log-Likelihood: 121.4315
Max. group size: 8 Converged: Yes
Mean group size: 4.6
------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
------------------------------------------------------
Intercept -0.001 0.002 -0.459 0.646 -0.006 0.004
Sol 0.000 0.000 0.185 0.853 -0.000 0.000
sd1 1.414 0.000 9573.291 0.000 1.414 1.415
Group Var 0.000
======================================================
--- Interpretation ---
The coefficient for sd1 is 1.4142 with a p-value of 0.0000E+00.
This is highly significant, confirming that SD1 is a very strong predictor of RMSSD, even when accounting for mission day and individual differences.
This result is expected, as they are mathematically related metrics, but it successfully demonstrates the multivariate modeling approach.
================================================================================
### Recommendation 3: External Validation (A Conceptual Framework) ###
================================================================================
This recommendation is conceptual and does not involve running new code on the current dataset. Instead, it outlines a critical next step for ensuring the research is robust and generalizable.
**What is External Validation?**
External validation is the process of testing your findings on a completely new, independent dataset. In this context, it would mean repeating the key parts of this analysis on data collected from a different analog mission with a different set of crew members.
**Why is it Important?**
It confirms that your findings are not just a coincidence or an idiosyncrasy of the specific 8 individuals in this study. If the same patterns emerge in a new cohort, it provides strong evidence that you have uncovered a genuine physiological phenomenon related to the analog mission environment.
**What is the KEY FINDING to Validate from This Study?**
Based on the ANOVA, Tukey HSD, and GAM results, the most powerful and statistically significant finding from your current dataset is the **high degree of inter-individual variability**.
- The ANOVA tests were significant for most metrics (RMSSD, pNN50, pNN20, hfnu, SD1).
- The Tukey HSD tests pinpointed which specific individuals were different from others (e.g., T05_Felicitas having significantly higher RMSSD than 5 other crew members).
- The GAM analysis also showed that the 'Subject' factor was highly significant.
**How to Validate This Finding:**
1. Collect HRV data from a new cohort of analog astronauts.
2. Run the same One-Way ANOVA analysis on the new data.
3. **Validation Check:** If the ANOVA results are again significant (p < 0.05), you have externally validated the finding that individual differences are a major driver of parasympathetic HRV in this environment. This strengthens the conclusion that "one size does not fit all" and that personalized physiological monitoring is crucial.